Member of Technical Staff, Post-Training, RL

Mirendil

• $350K — $500K *

San Francisco, CA 94112In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of experience in research engineering or a related field.
Strong background in reinforcement learning (RL) and machine learning techniques.
Experience with large-scale experimentation and debugging in model training.
Proficient in designing experimental protocols and analyzing complex data sets.
Solid engineering skills with a focus on system integration and scalability.

Responsibilities

Design experiments to enhance model reliability for complex tasks.
Develop and refine post-training recipes using techniques such as RL and distillation.
Scale RL processes to handle larger models and data sets.
Create methods for effective long-horizon reasoning in tasks requiring multiple decisions.
Establish verification pipelines to ensure quality and mitigate errors in reward systems.
Explore multi-task training strategies to balance specialization against general abilities.
Work with cross-functional teams to bring research experiments to production.

Benefits

Competitive health insurance plans.
Retirement savings plans with company matching.
Flexible working hours and remote work options.
Generous paid time off for personal and family needs.
Opportunities for professional development and career advancement.

Full Job Description

The Role

We are looking for research engineers to help build the post-training stack for frontier reasoning models.

This role sits at the point where model capability, training dynamics, data, verification, and infrastructure all meet. You will design and run the experiments that turn a strong base model into a model that can solve difficult tasks reliably: choosing training objectives, shaping data mixtures, building verifiers, debugging reward signals, scaling runs, and understanding why a recipe works or fails.

Researchers are also expected to have strong engineering skills. The best work here will involve both: forming hypotheses about training behavior, implementing them in real systems, running large-scale experiments, reading the resulting traces carefully, and turning the lessons into the next training run.

Some areas you may work on include:

Post-training recipes: Develop and iterate on RL, SFT, and distillation recipes. Understand how choices in objectives, data mixtures, hyperparameters, rollout generation, and filtering affect efficiency, stability, capability, and final model behavior.
Scaling RL: Make post-training work at larger scales: more tokens, longer trajectories, larger models, more steps, and larger compute budgets. This includes identifying the bottlenecks that appear only when an approach leaves the small-run regime.
Long-horizon reasoning: Train models on tasks where success depends on many intermediate decisions. Develop methods for assigning useful feedback across long trajectories, where sparse rewards, credit assignment, exploration, and verification all become harder.
Off-policy and asynchronous training: Work on training regimes where data is generated by older policies, different policies, or partially filtered policies. Build intuition and tooling for when off-policy data helps, when it hurts, and how to control the resulting instabilities.
Verification and reward quality: Build robust verification pipelines for tasks where correctness can be checked automatically or semi-automatically. Detect and reduce reward hacking, false positives, brittle verifiers, and other failure modes that make RL look better than it really is.
Multi-task post-training: Scale recipes across different task families and domains. Study the tradeoffs between specialization and generality, and design training mixtures that improve all capabilities together.
Experiment analysis and debugging: Develop a deep empirical understanding of training runs. Diagnose regressions, separate real improvements from noise, design better ablations, and build the probes and analyses needed to make post-training less opaque.
End-to-end execution: Work closely with systems, infrastructure, and data teams to get experiments from idea to production-scale runs. This includes making training pipelines reliable, ensuring data and verifier quality, and turning successful experiments into repeatable and scalable recipes.

If you're excited about building the infrastructure that makes frontier RL research possible at scale, we'd love to hear from you.

We offer a base salary of $350,000-$500,000 USD and a meaningful equity grant, depending on experience and background, along with competitive benefits.

* Ladders Estimates

Similar Jobs

Member of Technical Staff, Model Evaluation
$350K — $500K *
Mirendil
San Francisco, CA 94112 (San Francisco County)
Today
Research Scientist 5 — Content Representation Models (CRM)
$466K — $500K+*
Netflix
Los Gatos, CA 95032 (Santa Clara County)
3 days ago
Multimodal LLM Researcher
$300K — $400K *
DEEPREC.AI
Palo Alto, CA 94303 (Santa Clara County)
3 weeks ago
Researcher, Context - Agent Post-Training
$250K — $380K *
OpenAI
San Francisco, CA 94112 (San Francisco County)
1 month ago
Research Engineer 5 - LLM-Driven Product Understanding
$466K — $500K+*
Netflix
Remote
1 month ago

Get Ready For Your
Next Interview

More Jobs at Mirendil

Member of Technical Staff, Post-Training, RL Infra
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Member of Technical Staff, Post-Training, RL
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Member of Technical Staff, Post-Training, RL Environments
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Member of Technical Staff, Design Engineer
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Consumer Technology
In-Person
Member of Technical Staff, Infrastructure Engineer
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Enterprise Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
Salesforce Developer
$112K — $167K *
BigCommerce
Austin, TX 78745 (Travis County)
Today
PostgreSQL Database Analyst
$90K — $120K *
Bridge Core (BCore)
Springfield, VA 22153 (Fairfax County)
Today
Asst. Director, Architecture
$126K — $207K *
Sysmex
Lincolnshire, IL 60069 (Lake County)
Today
SharePoint Administrator
$90K — $105K *
Cherokee Nation Businesses
Remote
Today

Find similar Member of Technical Staff, Post-Training, RL jobs:

Nationwide San Francisco, CA

Member of Technical Staff, Post-Training, RL

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Member of Technical Staff, Post-Training, RL jobs:

Get Ready For Your
Next Interview