Figure AI's robotics AI system “Helix” for humanoid robots – a Vision-Language-Action (VLA) model

Konrad Wolfenstein

1 year ago

Figure AI's robotics AI system “Helix” for humanoid robots – a Vision-Language-Action (VLA) model – Image: Xpert.Digital

Helix: The AI system that takes humanoid robots to a new level

Summary: Vision, language, movement: Helix as a milestone in robotics

Helix is an innovative AI system for humanoid robots developed by Figure AI. It is a Vision-Language-Action (VLA) model that combines visual perception, speech understanding, and precise motor control in a single system. Helix marks a significant advancement in the development of flexible robotic systems for unstructured environments such as homes. With its ability to perform complex tasks without prior training, it could revolutionize human-machine interaction.

Related to this:

Voice-controlled robots: Helix by Figure AI is changing everything! Industry, household, future – understand, learn, execute in real time

Helix's abilities

Real-time control of the entire upper body of humanoid robots, including 35 axes of movement
Processing of speech input and visual information to perform complex tasks
Recognition and handling of unknown objects without specific training
Collaboration between multiple robots in the execution of tasks
Performing household tasks such as stocking a refrigerator

Technical details

Consists of two main components:

A multimodal language model with 7 billion parameters (7-9 Hz)
A motion AI with 80 million parameters (200 Hz)

Trained with only 500 hours of supervised training
Runs on energy-efficient embedded GPUs

Biggest competitors

Google DeepMind: Developing similar VLA models to RT-2
Meta: Working on advanced humanoid robots
Apple: Also in the race to develop advanced AI humanoids
OpenAI: Former partner of Figure AI, now a competitor in the field of AI development

Google DeepMind

Google DeepMind has unveiled RT-2 (Robotics Transformer 2), a groundbreaking vision-language-action (VLA) model. RT-2 enables robots to perform new tasks without specific training by learning concepts from text and image data on the internet and translating them into robotic actions. In tests, RT-2 demonstrated significantly improved performance on novel tasks compared to its predecessor, RT-1.

Related to this:

Google Project Mariner: Experimental AI agent as a browser extension – Autonomous web navigation with DeepMind technology

Apple

Apple is also exploring both humanoid and non-humanoid robot designs. However, the company is still in an early stage of development. Analyst Ming-Chi Kuo predicts that mass production is not possible until 2028 at the earliest. Apple is focusing particularly on human-robot interaction.

Related to this:

Is Apple gripped by robot fever? Job postings reveal Apple's robot offensive: Is the tech giant now attacking the household appliance market?

OpenAI

OpenAI, a former partner of Figure AI, is building its own robotics division and is focusing on robots as the embodiment of artificial intelligence in the real world. The company now competes directly with Google DeepMind and others in the field of AI development for robotics.

🎯🎯🎯 Benefit from Xpert.Digital's extensive, five-fold expertise in one comprehensive service package | BD, R&D, XR, PR & Digital Visibility Optimization

Benefit from Xpert.Digital's extensive, five-fold expertise in a comprehensive service package | R&D, XR, PR & Digital Visibility Optimization - Image: Xpert.Digital

Xpert.Digital possesses in-depth knowledge across various industries. This allows us to develop tailored strategies precisely aligned with the requirements and challenges of your specific market segment. By continuously analyzing market trends and monitoring industry developments, we can act proactively and offer innovative solutions. The combination of experience and expertise generates added value and provides our clients with a decisive competitive advantage.

More information here:

Benefit from Xpert.Digital's 5 areas of expertise in one package – starting from just €500/month

Helix: Differentiation compared to other AI systems for robots

Innovative VLA model: Helix combines perception, language and movement

Figure AI's recent launch of Helix marks a significant advancement in the robotics AI landscape. This innovative Vision-Language-Action (VLA) model distinguishes itself from existing systems through several groundbreaking features, setting new standards for controlling humanoid robots. Helix integrates visual perception, speech understanding, and precise motion control into a single system specifically designed to address the challenges of physical robotics.

Unique dual-system architecture

Perhaps the most significant difference between Helix and other AI systems for robots lies in its innovative two-component architecture. This dual-system structure solves a fundamental problem in robotics AI.

System 1 and System 2: A complementary intelligence

Unlike conventional approaches, Helix uses two complementary systems that together achieve a unique balance between universality and speed. System 2 (S2) is a multimodal language model with 7 billion parameters, operating at a frequency of 7-9 Hz, and functions as the robot's analytical "brain." It processes visual data and speech commands, interprets the environment, and decides which actions to perform.

Complementing this is System 1 (S1), a fast, reactive visuomotor control unit with 80 million parameters. This component translates the semantic information supplied by S2 into precise, continuous robot actions at an impressive frequency of 200 Hz. Figure AI explains that previous approaches failed due to either a lack of universality or speed: “Using VLM (Visual Large Language Model) is universal but not fast, and using visual motion strategies for robots is fast but not universal.” Helix overcomes this dichotomy through its dual structure.

This architecture differs fundamentally from other well-known VLA models such as Google DeepMind's RT-2, which also combines visual data and voice commands, but does not have a comparable division into two parts.

Related to this:

Google's Gemini platform with Google AI Studio, Google Deep Research with Gemini Advanced, and Google DeepMind

Comprehensive control capabilities

Control over 35 degrees of freedom

Another distinguishing feature of Helix is its ability to coordinate 35 degrees of freedom simultaneously. This comprehensive control allows for precise, high-speed manipulation of the entire humanoid upper body, including wrists, torso, head, and individual fingers. This control capability surpasses most existing systems and enables complex manipulation tasks requiring a high degree of fine motor skills.

Object generalization and learning

Universal object recognition without specific training

A key feature of Helix is its ability to recognize and handle virtually any small household object without prior training on its specific characteristics. This broad generalizability allows the system to handle thousands of objects with varying shapes, sizes, colors, and material properties.

Unlike many other AI robot systems that need to be reprogrammed or retrained for each new task or object type, Helix can adapt to different situations and respond to natural language commands. This represents a paradigm shift, as the system uses a single neural network to learn all behaviors—such as picking up and putting down objects, using drawers and refrigerators, and interacting with other robots—without task-specific fine-tuning.

Multi-robot coordination

Unique collaboration skills

Helix is the first VLA model capable of simultaneously controlling two robots and enabling them to collaborate. This capability allows the robots to jointly solve complex tasks involving passing objects and coordinating their movements. Particularly noteworthy is the almost human-like communication between the robots through head nodding and eye contact.

This form of coordination represents a significant advancement over conventional systems, where each robot is typically controlled individually or requires specific training for particular roles. With Helix, both robots use the same model weights without the need for individual adjustments.

Training efficiency and implementation

Minimal training requirements, maximum performance

Another key difference lies in the remarkable efficiency of the training process. Helix was developed using only 500 hours of high-quality, teleoperated training data, significantly less than comparable approaches that often require thousands of hours of specific demonstrations. This efficiency not only underscores the system's technical sophistication but also its economic viability for commercial applications.

Embedded-capable processing

Unlike many robotics AI systems that rely on powerful external servers, Helix runs entirely on embedded, energy-efficient GPUs within the robots. This on-board processing eliminates the need for a constant connection to external computing resources, making the robot more autonomous and flexible in different environments.

Strategic differentiation

Vertical integration instead of generic AI models

Figure AI has strategically differentiated itself from other companies by ending its collaboration with OpenAI and pursuing a vertically integrated strategy, developing both hardware and software in-house. CEO Brett Adcock explained that generic AI models are insufficient to meet the requirements of embodied AI—that is, AI in physical robots. This decision underscores the company's approach of developing tailored solutions for the specific challenges of robotics, rather than relying on general AI models.

Application orientation

Focus on household use

While many industry players are currently focusing on industrial or workplace robot applications, Figure AI is pursuing a strategically surprising approach with Helix, focusing on household robotics. The robots' ability to perform everyday tasks such as sorting groceries, stocking the refrigerator, or handling a wide variety of household items targets a market that other players often consider too complex to enter.

Multi-robot coordination: The key to the next generation of robotics

Helix stands out from other AI robotics systems due to its dual-system architecture, comprehensive control capabilities, remarkable generalization ability, and multi-robot coordination. With its efficient training process, embedded processing, and strategic focus on household applications, it represents a significant advancement in the development of humanoid robots. While other systems, such as Google DeepMind's RT-2, pursue similar approaches of combining visual data and voice commands, Helix offers differentiating advantages through its unique architecture and integrated development approach, making it a pioneer in the next generation of AI-powered robots.

We are here for you - Consulting - Planning - Implementation - Project Management

☑️ SME support in strategy, consulting, planning and implementation

☑️ Creation or realignment of the digital strategy and digitization

☑️ Expansion and optimization of international sales processes

☑️ Global & Digital B2B trading platforms

☑️ Pioneer Business Development

Konrad Wolfenstein

I would be happy to serve as your personal advisor.

You can contact me by filling out the contact form below or simply call me on +49 7348 4088 965 .

I'm looking forward to our joint project.

Write to me

➡️ Video call request 👩👱

Xpert.Digital - Konrad Wolfenstein

Xpert.Digital is a hub for industry focusing on digitalization, mechanical engineering, logistics/intralogistics and photovoltaics.

With our 360° Business Development solution, we support renowned companies from new business to after-sales.

Market intelligence, smarketing, marketing automation, content development, PR, mail campaigns, personalized social media and lead nurturing are part of our digital tools.

You can find more information at: www.xpert.digital - www.xpert.solar - www.xpert.plus

Keep in touch

Helix: The AI ​​system that takes humanoid robots to a new level

Summary: Vision, language, movement: Helix as a milestone in robotics

Helix's abilities

Technical details

Biggest competitors

Google DeepMind

Meta

Apple

OpenAI

🎯🎯🎯 Benefit from Xpert.Digital's extensive, five-fold expertise in one comprehensive service package | BD, R&D, XR, PR & Digital Visibility Optimization

Helix: Differentiation compared to other AI systems for robots

Innovative VLA model: Helix combines perception, language and movement

Unique dual-system architecture

System 1 and System 2: A complementary intelligence

Comprehensive control capabilities

Control over 35 degrees of freedom

Object generalization and learning

Universal object recognition without specific training

Multi-robot coordination

Unique collaboration skills

Training efficiency and implementation

Minimal training requirements, maximum performance

Embedded-capable processing

Strategic differentiation

Vertical integration instead of generic AI models

Application orientation

Focus on household use

Multi-robot coordination: The key to the next generation of robotics

☑️ SME support in strategy, consulting, planning and implementation

☑️ Creation or realignment of the digital strategy and digitization

☑️ Expansion and optimization of international sales processes

☑️ Global & Digital B2B trading platforms

☑️ Pioneer Business Development

Other topics

Helix: The AI system that takes humanoid robots to a new level