About the RoleYou will build the base intelligence layer for robotics. We train large-scale robot foundation models from massive multimodal datasets spanning video, proprioception, action traces, language, and more. You will design and run the core large-scale training efforts that give our models fundamentally new general capabilities across embodiments, tasks, and environments. You will "live and breathe" all forms of robot data.
You'll be responsible for:- Designing and executing large-scale pretraining runs for robot foundation models (transformer- and diffusion-based architectures)
- Defining model architectures, objectives, and training curricula across multimodal robotic data (vision, action, state, language)
- Developing scalable data mixtures and sampling strategies across petabyte-scale datasets
- Guiding data collection operations towards new directions, as well as sourcing new datasets
- Running ablations to understand scaling laws, data quality effects, and architecture tradeoffs
- Collaborating closely with ML Infra and Systems to push cluster utilization, throughput, and reliability
- Turning raw robotic interaction data into generalizable model capabilities
You might thrive in this role if you:- Have deep experience training large transformer or diffusion models at scale (for generative models e.g. including language models, audio models, or video models)
- Have led or significantly contributed to multi-node, multi-GPU distributed training efforts
- Have worked on scaling laws, optimization dynamics, and large-model failure modes
- Have strong PyTorch fundamentals and comfort debugging at every layer of the stack
- Care about both empirical rigor and raw iteration speed
- Are excited about building general-purpose robot intelligence from first principles