In a groundbreaking development, Helix introduces a first-of-its-kind “System 1, System 2” Visual Language Architecture (VLA) designed to enable high-rate, dexterous control of humanoid robots’ upper bodies. Unlike previous models, which face a critical tradeoff between generality and speed, Helix offers a solution through its innovative dual-system approach. The Problem with Previous Models: Earlier robotic systems had to choose between versatility and quick execution. While Vision-Language Models (VLMs) offered generalization capabilities, they operated at slower speeds, limiting their real-time adaptability. On the other hand, reactive visuomotor policies, though fast, lacked the ability to generalize across different objects and contexts. The Helix Solution: Helix resolves this challenge with two complementary systems, trained to work together seamlessly:
System 2 (S2): An internet-pretrained VLM operating at 7-9 Hz, which provides broad scene understanding and language comprehension. This system allows the robot to generalize across various objects and contexts, helping it plan high-level goals.
System 1 (S1): A fast-reactive visuomotor policy running at 200 Hz, which takes the semantic representations from S2 and translates them into precise, real-time robot actions.
This decoupled architecture allows each system to operate optimally, with S2 handling complex, slow-thinking tasks, while S1 executes quick actions in real-time. For example, in collaborative settings, S1 can dynamically adjust to a partner robot’s movements while staying aligned with the high-level objectives set by S2, ensuring a smooth and adaptive interaction.
Helix sets a new standard for advanced robotics, merging the strengths of generalization and speed into a unified system capable of performing complex tasks with dexterity and precision.