Applied Reinforcement Learning Engineer

Centific

• $150K — $300K *

US-AnywhereRemote in United States

Enterprise Technology

Less than 5 years of experience

Reposted 6 days ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

3+ years of hands-on Deep Reinforcement Learning (RL) experience
Expertise in environment design, reward engineering, and policy optimization
Experience fine-tuning LLMs using methods like RLHF and DPO
Strong Python skills and familiarity with Gymnasium, RLlib, Stable Baselines
MS/PhD in Computer Science, Machine Learning, or related field (or equivalent experience)

Responsibilities

Design and build custom RL environments simulating enterprise workflows
Post-train LLM-based agents on domain-specific tasks
Build end-to-end pipelines converting human-labeled traces into RL training data
Architect multi-step reasoning agents with tool-calling and closed learning loops
Design reward functions, verifiers, and validation frameworks for testing

Benefits

Shape a new discipline at the intersection of RL, simulation, and enterprise AI
See your research power real systems across various industries
Collaborate with leaders from NVIDIA, Microsoft, and the global AI community
Create governed, compliant AI systems that enterprises can trust

Full Job Description

Role: Applied Reinforcement Learning Engineer

Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)

About the Team

Centific AI Research advances foundational AI models and applications through reinforcement learning, alignment, and human-centered intelligence. Our mission is to transform data, signals, and human insight into next-generation intelligent systems that redefine enterprise intelligence.

We're building a governed RL environment platform that enables enterprises to safely iterate and improve AI agent workflows through simulation-based learning, bridging human-labeled signal creation with automated RL training for high-stakes operations.

Role Overview

As an Applied RL Engineer, you will design and build RL environments that simulate complex enterprise workflows and train intelligent agents within them. You'll work at the intersection of RL research and production systems, translating customer requirements into bespoke simulation environments and post-training pipelines that deliver measurable improvements to AI agent performance.

This role requires deep expertise in both classical RL methodologies and modern LLM-based agent architectures. You'll shape our product direction and help make RL accessible to enterprise customers who need safe, compliant ways to improve their AI systems.

Core RL Competencies

Foundational RL
• MDPs & value methods: State/action spaces, Q-learning, DQN, Double DQN, Dueling DQN
• Policy gradient methods: REINFORCE, Actor-Critic, A2C/A3C, variance reduction
• Advanced optimization: PPO, TRPO, SAC, trust regions, entropy regularization
• TD learning: TD(0), TD(λ), eligibility traces, bootstrapping methods

LLM Alignment & Post-Training
• RLHF pipelines: Reward model training, preference learning, human feedback integration
• Direct optimization: DPO, IPO, KTO, offline preference optimization
• Group-based methods: GRPO, RLOO, sample-efficient policy improvement
• Reward modeling: Bradley-Terry models, reward hacking mitigation, KL constraints

Environment Design
• Gymnasium/OpenAI Gym: Custom environments, observation/action spaces, wrapper patterns
• Reward engineering: Sparse vs. dense rewards, potential-based shaping, intrinsic motivation
• Verifier design: Programmatic reward functions, outcome verification, ground-truth evaluation
• Simulation: Sim-to-real transfer, domain randomization, multi-agent dynamics

Advanced Techniques
• Offline RL: CQL, BCQ, IQL for learning from fixed datasets without environment interaction
• Model-based RL: World models, Dreamer, MuZero, learned dynamics
• Hierarchical RL: Options framework, goal-conditioned policies, temporal abstraction
• Imitation & exploration: Behavioral cloning, GAIL, curiosity-driven exploration, UCB

Key Responsibilities
• Design and build custom RL environments (digital twins) simulating enterprise workflows: document processing, compliance, onboarding, support automation
• Post-train LLM-based agents on domain-specific tasks using PPO, GRPO, DPO, and RLHF
• Build end-to-end pipelines converting human-labeled traces into RL training data
• Architect multi-step reasoning agents with tool-calling and closed learning loops
• Design reward functions, verifiers, and validation frameworks for pre-deployment testing
• Translate cutting-edge RL research into production systems; contribute to publications

Required Qualifications
• Deep RL expertise: 3+ years hands-on experience with environment design, reward engineering, policy optimization
• LLM post-training: Experience fine-tuning LLMs using RLHF, DPO, PPO, or similar
• Production skills: Software engineering beyond research with scalable pipelines and training infrastructure
• Agentic AI: Experience with LLM-based agents, tool use, multi-step reasoning
• Technical stack: Strong Python; Gymnasium, RLlib, Stable Baselines; PyTorch/JAX/TensorFlow
• Education: MS/PhD in CS, ML, or related field (or equivalent experience)

Preferred Qualifications
• Publications at NeurIPS, ICML, ICLR, ACL, or similar venues
• Enterprise workflow experience in healthcare, finance, logistics, or compliance
• Open-source contributions to CleanRL, TRL, veRL, or agent frameworks
• Experience with world models, synthetic data generation, and simulation
• Distributed training and large-scale RL experimentation

Why Join Centific
• Lead the frontier: Shape a new discipline at the intersection of RL, simulation, and enterprise AI
• Ship your science: See your research power real systems across healthcare, finance, and safety
• Collaborate with leaders: Work alongside NVIDIA, Microsoft, and the global AI community
• Build what matters: Create governed, compliant AI systems enterprises can trust.

Salary: $150K - $300K Annually

* Ladders Estimates

Similar Jobs

Applied Scientist
$102K — $202K *
Microsoft
Mountain View, CA 94040 (Santa Clara County)
Today
Applied Scientist
$102K — $202K *
Microsoft
Redmond, WA 98052 (King County)
Today
Applied Science: PhD Microsoft AI Internship Opportunities - Redmond
$81K — $161K *
Microsoft
Redmond, WA 98052 (King County)
Reposted Today
Machine Learning Scientist II
$112K — $156K *
Expedia Group
Seattle, WA 98115 (King County)
Today
Machine Learning Scientist II
$112K — $156K *
Expedia Group
Washington, DC 20011 (District Of Columbia County)
Today
Machine Learning Scientist II
$122K — $171K *
Expedia Group
San Jose, CA 95123 (Santa Clara County)
Today

Get Ready For Your
Next Interview

More Jobs at Centific

HR Operations Consultant
$150K — $200K *
Washington, DC 20011 (District Of Columbia County)
Today
Business Services
In-Person
HR Operations Consultant
$150K — $200K *
Redmond, WA 98052 (King County)
Today
Business Services
In-Person
SDE2-3
$80K — $110K *
Remote
Reposted 3 days ago
Enterprise Technology
Remote in United States
Applied Reinforcement Learning Engineer
$150K — $300K *
Remote
Reposted 6 days ago
Enterprise Technology
Remote in United States
AI Recruiter
$90K — $100K *
Remote
Reposted 1 week ago
Staffing
Remote in Redmond, WA

More Enterprise Technology Jobs

ServiceNow/Nuvolo Developer
$80K — $120K *
NetImpact Strategies
Remote
Today
Solutions Engineer, Data
$90K — $120K *
Liberty Mutual
Hoffman Estates, IL 60169 (Cook County)
Today
Integration Developer
$115K — $145K *
KYOCERA Document Solutions America, Inc.
Fairfield, NJ 07004 (Essex County)
Today
Senior Manager, Enterprise Change Management
$100K — $130K *
Hexagon
Calgary, AB T1Y 7M8
Today
Automation Test Engineer II
$56K — $130K *
Berkshire Hathaway GUARD Insurance Companies
Wilkes Barre, PA 18702 (Luzerne County)
Today

Find similar Applied Reinforcement Learning Engineer jobs:

Nationwide Remote

Applied Reinforcement Learning Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Applied Reinforcement Learning Engineer jobs:

Get Ready For Your
Next Interview