About JobWhat You'll Do
- Design simulation environments and digital twins for enterprise workflows
- Post-train LLM agents using RLHF, DPO, GRPO, PPO, and emerging methods
- Build pipelines that convert human-labeled traces and verifiable signals into training data
- Architect multi-turn, tool-using agents with closed learning loops
- Design reward functions and verifiers that resist reward hacking and reflect real task outcomes
- Set the technical bar across the team - architecture, code review, engineering standards
- Mentor researchers and engineers; drive technical direction through influence
- Translate research into production; contribute to publications
Required Qualifications
Experience & Education
- 7+ years in ML/AI research or engineering; 3+ years at senior/staff level
- MS or PhD in Computer Science, Machine Learning, or related field (or equivalent)
- 5+ years hands-on RL - environment design, reward engineering, policy optimization - with at least one production deployment
LLM Post-Training
- 3+ years fine-tuning LLMs with hands-on RL post-training (RLHF, DPO, GRPO, PPO)
- Expert-level implementation of RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO
- Working knowledge of modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL)
Agent Engineering
- Experience building LLM-based agents: tool use, multi-turn reasoning, trajectory evaluation
- Strong Python and software engineering skills - comfortable building production pipelines, not just notebooks
RL Foundations
- Deep expertise in MDPs, policy gradient methods (PPO, SAC), and temporal difference learning
- Hands-on experience with Gymnasium-based environments and reward engineering (sparse vs. dense)
Preferred Qualifications
- Publications at NeurIPS, ICML, ICLR, ACL, COLM, or similar venues
- Open-source contributions to post-training or agent frameworks (TRL, veRL, OpenRLHF, SkyRL)
- Experience with Offline RL (CQL, IQL), Model-based RL / World Models, or Hierarchical RL
- Background in synthetic data generation, simulation, or world models
- Domain experience in healthcare, finance, logistics, or compliance
- Distributed training on GPU clusters
Why Join Centific
- Lead the frontier. Shape a new discipline at the intersection of post-training, simulation, and enterprise AI.
- Ship your science. See your research power real systems across healthcare, finance, and safety-critical operations.
- Collaborate with leaders. Work alongside NVIDIA, Microsoft, and the global AI community.
- Build what matters. Create governed, compliant AI systems enterprises can actually trust.
Salary: $200k-$250k
How to Apply
Send your CV, a description of a technically complex system you personally built or led, and (if applicable) your publication list or open-source contributions to:
Subject: Senior Staff Research Scientist - RL