Full Job Description
We're looking for a Research Scientist or Research Engineer to own the strategy and systems for collecting, curating, and scaling high-quality robot learning data. This role sits at the intersection of robotics, data collection, and research - your work directly determines the diversity and quality of the demonstrations our models train on.
What You'll Do
• Design and implement teleoperation and demonstration collection systems for high-quality robot learning data
• Develop data quality metrics, curation pipelines, and filtering strategies specific to robotic interaction data
• Research methods to augment real robot data with synthetic, simulated, or cross-embodiment sources
• Identify and source external robotic datasets to expand training diversity across platforms and tasks
• Build tooling for researchers to explore, annotate, and iterate on robotic datasets
• Collaborate with pre-training and post-training teams to translate model data needs into concrete collection strategies
• Measure the downstream impact of data collection decisions on model and policy performance
What We're Looking For
• Hands-on experience with robotic data collection, teleoperation systems, or demonstration frameworks
• Understanding of what makes robot learning data useful: diversity, coverage, temporal quality, and action fidelity
• Strong software engineering skills for building reliable data collection and processing systems
• Ability to reason across hardware, pipelines, and model performance
• Experience working with real robotic hardware in a research or industrial setting
Nice to Have (But Not Required)
• Experience with sim-to-real transfer and synthetic data generation for robotics
• Familiarity with cross-embodiment datasets (e.g., Open X-Embodiment, DROID)
• Experience with VR teleoperation, motion capture, or dexterous demonstration collection
• Understanding of imitation learning and how data properties affect policy generalization
• PhD or strong research background in robotics or ML
Why This Role
• The data you collect and curate is the direct upstream dependency for all model quality
• Unique leverage: improvements to data quality compound across every training run
• Work across hardware, systems, and research in a way few roles allow
• Direct feedback loop with both robot operators and research scientists to continuously improve data quality