
Figure AI's robotics AI system “Helix” for humanoid robots – a Vision-Language-Action (VLA) model – Image: Xpert.Digital
Helix: The AI system that takes humanoid robots to a new level
Summary: Vision, language, movement: Helix as a milestone in robotics
Helix is an innovative AI system for humanoid robots developed by Figure AI. It is a Vision-Language-Action (VLA) model that combines visual perception, speech understanding, and precise motor control in a single system. Helix marks a significant advancement in the development of flexible robotic systems for unstructured environments such as homes. With its ability to perform complex tasks without prior training, it could revolutionize human-machine interaction.
Related to this:
- Voice-controlled robots: Helix by Figure AI is changing everything! Industry, household, future – understand, learn, execute in real time
Helix's abilities
- Real-time control of the entire upper body of humanoid robots, including 35 axes of movement
- Processing of speech input and visual information to perform complex tasks
- Recognition and handling of unknown objects without specific training
- Collaboration between multiple robots in the execution of tasks
- Performing household tasks such as stocking a refrigerator
Technical details
Consists of two main components:
- A multimodal language model with 7 billion parameters (7-9 Hz)
- A motion AI with 80 million parameters (200 Hz)
- Trained with only 500 hours of supervised training
- Runs on energy-efficient embedded GPUs
Biggest competitors
- Google DeepMind: Developing similar VLA models to RT-2
- Meta: Working on advanced humanoid robots
- Apple: Also in the race to develop advanced AI humanoids
- OpenAI: Former partner of Figure AI, now a competitor in the field of AI development
Google DeepMind
Google DeepMind has unveiled RT-2 (Robotics Transformer 2), a groundbreaking vision-language-action (VLA) model. RT-2 enables robots to perform new tasks without specific training by learning concepts from text and image data on the internet and translating them into robotic actions. In tests, RT-2 demonstrated significantly improved performance on novel tasks compared to its predecessor, RT-1.
Related to this:
- Google Project Mariner: Experimental AI agent as a browser extension – Autonomous web navigation with DeepMind technology
Meta
Meta is investing heavily in the development of AI-powered humanoid robots. The company has established a new team within its Reality Labs division, focused on the research and development of robots for consumers. Meta plans to develop AI systems, sensors, and software platforms that can also be used by other manufacturers.
Apple
Apple is also exploring both humanoid and non-humanoid robot designs. However, the company is still in an early stage of development. Analyst Ming-Chi Kuo predicts that mass production is not possible until 2028 at the earliest. Apple is focusing particularly on human-robot interaction.
Related to this:
- Is Apple gripped by robot fever? Job postings reveal Apple's robot offensive: Is the tech giant now attacking the household appliance market?
OpenAI
OpenAI, a former partner of Figure AI, is building its own robotics division and is focusing on robots as the embodiment of artificial intelligence in the real world. The company now competes directly with Google DeepMind and others in the field of AI development for robotics.
🎯🎯🎯 Benefit from Xpert.Digital's extensive, five-fold expertise in one comprehensive service package | BD, R&D, XR, PR & Digital Visibility Optimization
Benefit from Xpert.Digital's extensive, five-fold expertise in a comprehensive service package | R&D, XR, PR & Digital Visibility Optimization - Image: Xpert.Digital
Xpert.Digital possesses in-depth knowledge across various industries. This allows us to develop tailored strategies precisely aligned with the requirements and challenges of your specific market segment. By continuously analyzing market trends and monitoring industry developments, we can act proactively and offer innovative solutions. The combination of experience and expertise generates added value and provides our clients with a decisive competitive advantage.
More information here:
Helix: Differentiation compared to other AI systems for robots
Innovative VLA model: Helix combines perception, language and movement
Figure AI's recent launch of Helix marks a significant advancement in the robotics AI landscape. This innovative Vision-Language-Action (VLA) model distinguishes itself from existing systems through several groundbreaking features, setting new standards for controlling humanoid robots. Helix integrates visual perception, speech understanding, and precise motion control into a single system specifically designed to address the challenges of physical robotics.
Unique dual-system architecture
Perhaps the most significant difference between Helix and other AI systems for robots lies in its innovative two-component architecture. This dual-system structure solves a fundamental problem in robotics AI.
System 1 and System 2: A complementary intelligence
Unlike conventional approaches, Helix uses two complementary systems that together achieve a unique balance between universality and speed. System 2 (S2) is a multimodal language model with 7 billion parameters, operating at a frequency of 7-9 Hz, and functions as the robot's analytical "brain." It processes visual data and speech commands, interprets the environment, and decides which actions to perform.
Complementing this is System 1 (S1), a fast, reactive visuomotor control unit with 80 million parameters. This component translates the semantic information supplied by S2 into precise, continuous robot actions at an impressive frequency of 200 Hz. Figure AI explains that previous approaches failed due to either a lack of universality or speed: “Using VLM (Visual Large Language Model) is universal but not fast, and using visual motion strategies for robots is fast but not universal.” Helix overcomes this dichotomy through its dual structure.
This architecture differs fundamentally from other well-known VLA models such as Google DeepMind's RT-2, which also combines visual data and voice commands, but does not have a comparable division into two parts.
Related to this:
- Google's Gemini platform with Google AI Studio, Google Deep Research with Gemini Advanced, and Google DeepMind
Comprehensive control capabilities
Control over 35 degrees of freedom
Another distinguishing feature of Helix is its ability to coordinate 35 degrees of freedom simultaneously. This comprehensive control allows for precise, high-speed manipulation of the entire humanoid upper body, including wrists, torso, head, and individual fingers. This control capability surpasses most existing systems and enables complex manipulation tasks requiring a high degree of fine motor skills.
Object generalization and learning
Universal object recognition without specific training
A key feature of Helix is its ability to recognize and handle virtually any small household object without prior training on its specific characteristics. This broad generalizability allows the system to handle thousands of objects with varying shapes, sizes, colors, and material properties.
Unlike many other AI robot systems that need to be reprogrammed or retrained for each new task or object type, Helix can adapt to different situations and respond to natural language commands. This represents a paradigm shift, as the system uses a single neural network to learn all behaviors—such as picking up and putting down objects, using drawers and refrigerators, and interacting with other robots—without task-specific fine-tuning.
Multi-robot coordination
Unique collaboration skills
Helix is the first VLA model capable of simultaneously controlling two robots and enabling them to collaborate. This capability allows the robots to jointly solve complex tasks involving passing objects and coordinating their movements. Particularly noteworthy is the almost human-like communication between the robots through head nodding and eye contact.
This form of coordination represents a significant advancement over conventional systems, where each robot is typically controlled individually or requires specific training for particular roles. With Helix, both robots use the same model weights without the need for individual adjustments.
Training efficiency and implementation
Minimal training requirements, maximum performance
Another key difference lies in the remarkable efficiency of the training process. Helix was developed using only 500 hours of high-quality, teleoperated training data, significantly less than comparable approaches that often require thousands of hours of specific demonstrations. This efficiency not only underscores the system's technical sophistication but also its economic viability for commercial applications.
Embedded-capable processing
Unlike many robotics AI systems that rely on powerful external servers, Helix runs entirely on embedded, energy-efficient GPUs within the robots. This on-board processing eliminates the need for a constant connection to external computing resources, making the robot more autonomous and flexible in different environments.
Strategic differentiation
Vertical integration instead of generic AI models
Figure AI has strategically differentiated itself from other companies by ending its collaboration with OpenAI and pursuing a vertically integrated strategy, developing both hardware and software in-house. CEO Brett Adcock explained that generic AI models are insufficient to meet the requirements of embodied AI—that is, AI in physical robots. This decision underscores the company's approach of developing tailored solutions for the specific challenges of robotics, rather than relying on general AI models.
Application orientation
Focus on household use
While many industry players are currently focusing on industrial or workplace robot applications, Figure AI is pursuing a strategically surprising approach with Helix, focusing on household robotics. The robots' ability to perform everyday tasks such as sorting groceries, stocking the refrigerator, or handling a wide variety of household items targets a market that other players often consider too complex to enter.
Multi-robot coordination: The key to the next generation of robotics
Helix stands out from other AI robotics systems due to its dual-system architecture, comprehensive control capabilities, remarkable generalization ability, and multi-robot coordination. With its efficient training process, embedded processing, and strategic focus on household applications, it represents a significant advancement in the development of humanoid robots. While other systems, such as Google DeepMind's RT-2, pursue similar approaches of combining visual data and voice commands, Helix offers differentiating advantages through its unique architecture and integrated development approach, making it a pioneer in the next generation of AI-powered robots.
We are here for you - Consulting - Planning - Implementation - Project Management
☑️ SME support in strategy, consulting, planning and implementation
☑️ Creation or realignment of the digital strategy and digitization
☑️ Expansion and optimization of international sales processes
☑️ Global & Digital B2B trading platforms
☑️ Pioneer Business Development
I would be happy to serve as your personal advisor.
You can contact me by filling out the contact form below or simply call me on +49 7348 4088 965 .
I'm looking forward to our joint project.
Xpert.Digital - Konrad Wolfenstein
Xpert.Digital is a hub for industry focusing on digitalization, mechanical engineering, logistics/intralogistics and photovoltaics.
With our 360° Business Development solution, we support renowned companies from new business to after-sales.
Market intelligence, smarketing, marketing automation, content development, PR, mail campaigns, personalized social media and lead nurturing are part of our digital tools.
You can find more information at: www.xpert.digital - www.xpert.solar - www.xpert.plus

