Research Scientist: Pretraining

Generalist AI, Inc

$120K — $150K *
Consumer Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in machine learning, specifically with transformer or diffusion models
  • Proven track record in multi-node, multi-GPU distributed training
  • Expertise in scaling laws and optimization dynamics
  • Strong programming skills in PyTorch, with debugging capabilities across multiple layers
  • A background in empirical research and iterative model development
  • Passion for advancing robotics and creating general intelligence from core principles

Responsibilities

  • Design and execute large-scale pretraining runs for robot foundation models
  • Define architectures, objectives, and training curricula for multimodal robotic datasets
  • Develop scalable data mixtures and strategies for handling petabyte-scale datasets
  • Guide data collection efforts and source new datasets for training
  • Conduct ablations to examine scaling laws and data quality impacts
  • Collaborate with ML Infrastructure teams to enhance cluster performance
  • Transform raw robotic interaction data into versatile model capabilities

Benefits

  • Collaborative work environment that fosters innovation
  • Opportunity to contribute to cutting-edge research in robotics
  • Access to large and diverse multimodal datasets
  • Participate in a culture valuing empirical rigor and rapid iteration
  • Chance to revolutionize general-purpose robot intelligence through foundational work
Full Job Description
About the Role

You will build the base intelligence layer for robotics. We train large-scale robot foundation models from massive multimodal datasets spanning video, proprioception, action traces, language, and more. You will design and run the core large-scale training efforts that give our models fundamentally new general capabilities across embodiments, tasks, and environments. You will "live and breathe" all forms of robot data.

You'll be responsible for:
  • Designing and executing large-scale pretraining runs for robot foundation models (transformer- and diffusion-based architectures)
  • Defining model architectures, objectives, and training curricula across multimodal robotic data (vision, action, state, language)
  • Developing scalable data mixtures and sampling strategies across petabyte-scale datasets
  • Guiding data collection operations towards new directions, as well as sourcing new datasets
  • Running ablations to understand scaling laws, data quality effects, and architecture tradeoffs
  • Collaborating closely with ML Infra and Systems to push cluster utilization, throughput, and reliability
  • Turning raw robotic interaction data into generalizable model capabilities


You might thrive in this role if you:
  • Have deep experience training large transformer or diffusion models at scale (for generative models e.g. including language models, audio models, or video models)
  • Have led or significantly contributed to multi-node, multi-GPU distributed training efforts
  • Have worked on scaling laws, optimization dynamics, and large-model failure modes
  • Have strong PyTorch fundamentals and comfort debugging at every layer of the stack
  • Care about both empirical rigor and raw iteration speed
  • Are excited about building general-purpose robot intelligence from first principles

Similar Jobs

More Consumer Technology Jobs

Find similar Research Scientist: Pretraining jobs: