Staff Research Scientist - Reinforcement Learning

Centific

$200K — $250K *
US-AnywhereRemote in United States
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 7+ years in ML/AI research or engineering; 3+ years at senior/staff level
  • MS or PhD in Computer Science, Machine Learning, or a related field
  • 5+ years hands-on in reinforcement learning with at least one production deployment
  • 3+ years fine-tuning LLMs with RL post-training methods
  • Strong Python and software engineering expertise for building production pipelines

Responsibilities

  • Design simulation environments and digital twins for enterprise workflows
  • Post-train LLM agents using advanced reinforcement learning techniques
  • Build data pipelines converting human-labeled traces into training data
  • Architect multi-turn agents with closed learning loops
  • Create reward functions resistant to hacking and reflective of real outcomes
  • Mentor team members while setting technical standards
  • Translate research into practical applications and contribute to publications

Benefits

  • Leadership opportunity in a cutting-edge discipline
  • Impactful research with real-world applications in critical sectors
  • Collaboration with top-tier companies and the global AI community
  • Development of trusted and compliant AI systems for enterprises
Full Job Description
About Job

What You'll Do
  • Design simulation environments and digital twins for enterprise workflows
  • Post-train LLM agents using RLHF, DPO, GRPO, PPO, and emerging methods
  • Build pipelines that convert human-labeled traces and verifiable signals into training data
  • Architect multi-turn, tool-using agents with closed learning loops
  • Design reward functions and verifiers that resist reward hacking and reflect real task outcomes
  • Set the technical bar across the team - architecture, code review, engineering standards
  • Mentor researchers and engineers; drive technical direction through influence
  • Translate research into production; contribute to publications


Required Qualifications

Experience & Education
  • 7+ years in ML/AI research or engineering; 3+ years at senior/staff level
  • MS or PhD in Computer Science, Machine Learning, or related field (or equivalent)
  • 5+ years hands-on RL - environment design, reward engineering, policy optimization - with at least one production deployment


LLM Post-Training
  • 3+ years fine-tuning LLMs with hands-on RL post-training (RLHF, DPO, GRPO, PPO)
  • Expert-level implementation of RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO
  • Working knowledge of modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL)


Agent Engineering
  • Experience building LLM-based agents: tool use, multi-turn reasoning, trajectory evaluation
  • Strong Python and software engineering skills - comfortable building production pipelines, not just notebooks


RL Foundations
  • Deep expertise in MDPs, policy gradient methods (PPO, SAC), and temporal difference learning
  • Hands-on experience with Gymnasium-based environments and reward engineering (sparse vs. dense)


Preferred Qualifications
  • Publications at NeurIPS, ICML, ICLR, ACL, COLM, or similar venues
  • Open-source contributions to post-training or agent frameworks (TRL, veRL, OpenRLHF, SkyRL)
  • Experience with Offline RL (CQL, IQL), Model-based RL / World Models, or Hierarchical RL
  • Background in synthetic data generation, simulation, or world models
  • Domain experience in healthcare, finance, logistics, or compliance
  • Distributed training on GPU clusters


Why Join Centific
  • Lead the frontier. Shape a new discipline at the intersection of post-training, simulation, and enterprise AI.
  • Ship your science. See your research power real systems across healthcare, finance, and safety-critical operations.
  • Collaborate with leaders. Work alongside NVIDIA, Microsoft, and the global AI community.
  • Build what matters. Create governed, compliant AI systems enterprises can actually trust.


Salary: $200k-$250k

How to Apply

Send your CV, a description of a technically complex system you personally built or led, and (if applicable) your publication list or open-source contributions to:



Subject: Senior Staff Research Scientist - RL

Similar Jobs

More Jobs at Centific

More Information Technology Jobs

Find similar Staff Research Scientist - Reinforcement Learning jobs: