Member of Technical Staff - RL Algorithms

Vmax

$300K — $500K *
Technical Services
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • PhD or equivalent experience in machine learning or reinforcement learning.
  • Proven track record of research excellence through publications or technical contributions.
  • Deep knowledge of reinforcement learning, representation learning, and large language models (LLMs).
  • Strong familiarity with LLM post-training methodologies.
  • Experience in designing and executing rigorous ML experiments, including failure analysis.
  • Familiarity with large-scale ML infrastructure and distributed training environments.
  • Proficiency in Python and at least one major ML framework like PyTorch or JAX.

Responsibilities

  • Develop innovative reinforcement learning algorithms for post-training language models.
  • Adapt pre-LLM reinforcement learning concepts to modern LLM frameworks.
  • Establish evaluation protocols for assessing LLM RL performance metrics.
  • Analyze common failure modes in RL-trained models, addressing issues like reward hacking and exploration failures.
  • Collaborate across teams to translate research ideas into practical training systems.
  • Own and drive a research agenda from ideation to execution and communication of findings.

Benefits

  • Based in the vibrant tech hub of San Francisco with potential for a hybrid work arrangement.
Full Job Description

About the role

RL has become the de-facto method of post-training LLMs. We are limited by the sample efficiency of the current policy gradient algorithms in use today, and are looking for a talented researcher to weave together pre-LLM and post-LLM approaches to learning from experience.
Responsibilities
  • Develop new RL algorithms for post-training language models.
  • Adapt ideas from pre-LLM reinforcement learning, such as model-based RL, temporal abstraction, and value-based learning, to modern LLM and agentic settings.
  • Establish empirical baselines and evaluation protocols for measuring sample efficiency, robustness, generalization, and reward exploitation in LLM RL.
  • Analyze failure modes of RL-trained models, including reward hacking, mode collapse, over-optimization, exploration failures, and distribution shift.
  • Collaborate with researchers working on environments, evals, interpretability, reward modeling, and infrastructure to turn algorithmic ideas into reliable training systems.
  • Own and develop a research agenda within Vmax, from identifying promising directions to executing experiments and communicating results.
Minimum Requirements
  • PhD or equivalent experience in machine learning, reinforcement learning, or a closely related field.
  • Track record of research excellence, as demonstrated by publications, open source work, deployed AI systems, or other substantial technical contributions.
  • Deep understanding of modern machine learning, especially reinforcement learning, representation learning, and large language models.
  • Strong familiarity with LLM post-training methods.
  • Experience designing and running rigorous ML experiments, including ablations, baselines, evaluation design, and failure analysis.
  • Experience with large-scale ML infrastructure, distributed training, experiment tracking, data pipelines, and debugging unstable training runs.
  • Expertise with Python and at least one major ML framework such as PyTorch or JAX.
  • Ability to work independently on open-ended research problems and turn ambiguous ideas into concrete experimental programs.
Nice to have
  • Experience developing new RL algorithms or improving existing ones in domains such as robotics, games, simulated control, language models, or agents.
  • Experience with LLM pre-training.
  • Strong understanding of reward modeling, verifiers, process supervision, outcome supervision, or automated evaluation systems.
  • Demonstrated software engineering ability
  • Strong communication skills, especially the ability to explain algorithmic ideas, empirical results, and research implications to both technical and non-technical audiences
Role specific location policy
  • This role is based in our San Francisco office; for exceptional candidates we are willing to consider a hybrid arrangement
Compensation

The expected salary range for this position is $300,000 - $500,000 USD

Similar Jobs

More Jobs at Vmax

More Technical Services Jobs

Find similar Member of Technical Staff - RL Algorithms jobs: