Research Scientist / Engineer - Post-training & Robot Learning

Rhoda AI

$120K — $150K *
Consumer Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of hands-on experience in robotics or autonomous systems
  • Strong understanding of robot policy learning techniques such as imitation learning
  • Practical familiarity with robot hardware and sensor modalities
  • Solid skills in machine learning with PyTorch
  • Ability to diagnose policy failures and iterate effectively
  • Comfortable navigating ambiguous and evolving research priorities
  • Staff candidates must define technical directions independently, while senior candidates lead complex projects

Responsibilities

  • Design and implement reinforcement learning (RL) training pipelines
  • Develop RL algorithms adapted for video prediction
  • Implement post-training pipelines for fine-tuning and behavioral alignment
  • Develop inverse dynamics models for translating predictions into actions
  • Build evaluation frameworks for assessing robot policies
  • Research methods for adapting models to new tasks with minimal data
  • Identify and improve upon weaknesses in deployed robot policies
  • Iterate between simulation testing and real robot evaluations
  • Collaborate with pre-training teams to identify capabilities needing improvement

Benefits

  • High ownership on a small, expert team
  • Direct impact on the practical performance of robots in the real world
  • Opportunity to work at the intersection of advanced video models and physical robotics
  • Fast feedback loop from model adjustments to real-world application
Full Job Description
We're looking for Research Scientists and Research Engineers with deep robotics or autonomous systems domain knowledge to adapt our web-pretrained video model to real robot tasks. Post-training at Rhoda means taking a causal video generation model pretrained on internet-scale data and fine-tuning it on robot-collected demonstrations to produce reliable, generalizable behavior - with as little task-specific data as possible. We hire across levels - from senior to staff. What You'll Do - Design and implement RL training pipelines to improve robot policy performance beyond what imitation learning alone achieves - reward design, online data collection, and policy optimization - Develop and apply RL algorithms (PPO, GRPO, or similar) adapted to the video prediction setting, including reward modeling and feedback collection strategies for physical task performance - Design and implement broader post-training pipelines: supervised fine-tuning, preference optimization, and behavioral alignment on robot-collected demonstration data - Work on the inverse dynamics model that translates video predictions into executable robot actions - Build evaluation frameworks for post-trained policies: task success, generalization to novel objects and environments, and failure mode analysis on real hardware - Research methods to efficiently adapt models to new tasks with minimal demonstration data, including in-context generalization and few-shot adaptation - Identify failure modes and systematic weaknesses in deployed robot policies and drive targeted improvements - Iterate quickly between simulation and real robot evaluation to close the feedback loop - Collaborate with the pre-training team to surface what capabilities are missing from the base model and need to be addressed upstream What We're Looking For - Hands-on experience with robot systems, robotic policy learning, or autonomous systems in an industry or research setting (robotics, self-driving, or similar physical AI domains) - Strong understanding of robot policy learning: imitation learning, behavior cloning, and how RL builds on top of it - Practical familiarity with real robot hardware, deployment constraints, and sensor modalities (vision, proprioception) - Solid ML skills with hands-on PyTorch experience - Ability to diagnose policy failures, reason about distribution shift, and iterate effectively on data and training strategies - Comfort with ambiguity and fast-changing research priorities - Staff-level candidates are expected to define technical direction and drive research strategy independently; senior candidates execute complex projects with strong fundamentals and growing scope Nice to Have (But Not Required) - Hands-on experience with reinforcement learning - reward design, policy optimization, and online RL training loops - applied to real or near-real environments (robotics, games, simulated physics, or similar); this is a significant plus - Prior industry experience in robotics, autonomous driving, or physical AI (e.g., manipulation, mobile robotics, self-driving stacks) - Experience with teleoperation systems or robot demonstration collection at scale - Familiarity with robot middleware (ROS/ROS2) and real-time control systems - Experience with simulation environments for robotics (MuJoCo, Isaac Sim, Genesis) - Understanding of video generation models and how they connect to action prediction - PhD in Robotics, ML, or a related field - Publication record at ICRA, CoRL, RSS, NeurIPS, or related venues Why This Role - Your work is what makes our robots actually perform tasks reliably in the real world - the direct connection between pre-trained capability and deployed behavior - Work at a rare intersection: state-of-the-art video generation models applied to real robot hardware, not simulation - Fast feedback loop between model changes and real robot performance - High ownership on a small team where robotics domain expertise is core to the mission

Similar Jobs

More Jobs at Rhoda AI

More Consumer Technology Jobs

Find similar Research Scientist / Engineer - Post-training & Robot Learning jobs: