Staff Research Scientist - Reinforcement Learning

Centific

• $200K — $250K *

US-AnywhereRemote in United States

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

7+ years in ML/AI research or engineering; 3+ years at senior/staff level
MS or PhD in Computer Science, Machine Learning, or a related field
5+ years hands-on in reinforcement learning with at least one production deployment
3+ years fine-tuning LLMs with RL post-training methods
Strong Python and software engineering expertise for building production pipelines

Responsibilities

Design simulation environments and digital twins for enterprise workflows
Post-train LLM agents using advanced reinforcement learning techniques
Build data pipelines converting human-labeled traces into training data
Architect multi-turn agents with closed learning loops
Create reward functions resistant to hacking and reflective of real outcomes
Mentor team members while setting technical standards
Translate research into practical applications and contribute to publications

Benefits

Leadership opportunity in a cutting-edge discipline
Impactful research with real-world applications in critical sectors
Collaboration with top-tier companies and the global AI community
Development of trusted and compliant AI systems for enterprises

Full Job Description

About Job

What You'll Do

Design simulation environments and digital twins for enterprise workflows
Post-train LLM agents using RLHF, DPO, GRPO, PPO, and emerging methods
Build pipelines that convert human-labeled traces and verifiable signals into training data
Architect multi-turn, tool-using agents with closed learning loops
Design reward functions and verifiers that resist reward hacking and reflect real task outcomes
Set the technical bar across the team - architecture, code review, engineering standards
Mentor researchers and engineers; drive technical direction through influence
Translate research into production; contribute to publications

Required Qualifications

Experience & Education

7+ years in ML/AI research or engineering; 3+ years at senior/staff level
MS or PhD in Computer Science, Machine Learning, or related field (or equivalent)
5+ years hands-on RL - environment design, reward engineering, policy optimization - with at least one production deployment

LLM Post-Training

3+ years fine-tuning LLMs with hands-on RL post-training (RLHF, DPO, GRPO, PPO)
Expert-level implementation of RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO
Working knowledge of modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL)

Agent Engineering

Experience building LLM-based agents: tool use, multi-turn reasoning, trajectory evaluation
Strong Python and software engineering skills - comfortable building production pipelines, not just notebooks

RL Foundations

Deep expertise in MDPs, policy gradient methods (PPO, SAC), and temporal difference learning
Hands-on experience with Gymnasium-based environments and reward engineering (sparse vs. dense)

Preferred Qualifications

Publications at NeurIPS, ICML, ICLR, ACL, COLM, or similar venues
Open-source contributions to post-training or agent frameworks (TRL, veRL, OpenRLHF, SkyRL)
Experience with Offline RL (CQL, IQL), Model-based RL / World Models, or Hierarchical RL
Background in synthetic data generation, simulation, or world models
Domain experience in healthcare, finance, logistics, or compliance
Distributed training on GPU clusters

Why Join Centific

Lead the frontier. Shape a new discipline at the intersection of post-training, simulation, and enterprise AI.
Ship your science. See your research power real systems across healthcare, finance, and safety-critical operations.
Collaborate with leaders. Work alongside NVIDIA, Microsoft, and the global AI community.
Build what matters. Create governed, compliant AI systems enterprises can actually trust.

Salary: $200k-$250k

How to Apply

Send your CV, a description of a technically complex system you personally built or led, and (if applicable) your publication list or open-source contributions to:

Subject: Senior Staff Research Scientist - RL

* Ladders Estimates

Similar Jobs

Maven Exploitation Specialist/ Imagery Scientist (EO Focused) Expert
$210K — $230K *
BTS Software Solutions
Springfield, VA 22153 (Fairfax County)
Today
Senior Principle, Innovation Scout
$119K — $222K *
Novartis Pharmaceuticals
Cambridge, MA 02139 (Middlesex County)
Today
Ingénieur-e sénior.e en recherche - intelligence artificielle/apprentissage automatique - Innovation créative/Senior Research Developer (AI/ML) - Creative Innovation
$141K — $204K *
Electronic Arts Inc
Vancouver, BC V5K 5J9
Yesterday
Senior Research Data Scientist, Payments Platform
$174K — $253K *
Google
Mountain View, CA 94040 (Santa Clara County)
Yesterday
Senior Research Engineer
$174K — $252K *
Google
New York, NY 10025 (New York County)
3 days ago
Senior Research Engineer
$174K — $252K *
Google
Mountain View, CA 94040 (Santa Clara County)
3 days ago

Get Ready For Your
Next Interview

More Jobs at Centific

Staff Research Scientist - Reinforcement Learning
$200K — $250K *
Remote
Today
Information Technology
Remote in United States
Strategic Account Executive
$160K — $180K *
Remote
Reposted Yesterday
Enterprise Technology
Remote in United States
Strategic Account Executive
$160K — $180K *
Remote
Reposted 1 week ago
Enterprise Technology
Remote in United States
SDE2-1
$80K — $110K *
Remote
2 weeks ago
Information Technology
Remote in United States
SDE2-4
$80K — $110K *
Remote
3 weeks ago
Information Technology
Remote in United States

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Sr. Manager, IT Security Operations
$100K — $130K *
Swire Coca-Cola, USA
Draper, UT 84020 (Salt Lake County)
Reposted Today
Oracle PL/SQL Developer
$90K — $120K *
T & T Consulting Services, Inc.
Gloucester, MA 01930 (Essex County)
Reposted Today
Infrastructure Administrator
$70K — $95K *
Cnc Software, Inc.
Tolland, CT 06084 (Capitol County)
Today
Back-End Developer - Remote
$90K — $120K *
Creative Information Technology
Remote
Today

Find similar Staff Research Scientist - Reinforcement Learning jobs:

Nationwide Remote

Staff Research Scientist - Reinforcement Learning

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff Research Scientist - Reinforcement Learning jobs:

Get Ready For Your
Next Interview