Member of Technical Staff - RL Infrastructure

Vmax

$300K — $500K *
Technical Services
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Strong software engineering experience
  • Experience building infrastructure for LLM inference and/or RL training
  • Proficiency with GPU clusters and distributed training systems
  • Familiarity with vLLM, SGLang, and modern LLM-RL frameworks
  • Understanding of system reliability and observability
  • Ability to collaborate with ML researchers to improve workflows
  • Experience creating tools for technical users

Responsibilities

  • Build infrastructure for distributed RL training and inference across thousands of GPUs
  • Enhance reliability, debuggability, and throughput of RL experiments
  • Create user-friendly interfaces for experiment management
  • Oversee infrastructure projects from design to long-term maintenance
  • Identify and eliminate performance bottlenecks in training and data processes
  • Maintain high engineering standards for RL infrastructure

Benefits

  • Flexible work policy with potential hybrid arrangements
  • Engagement with fast-paced ML teams
  • Opportunity to work with cutting-edge RL technology
  • A high engineering bar environment promoting ownership and quality
  • Support for independent technical projects and open-source contributions
Full Job Description
About the role

This role is for strong infrastructure engineers who can build the systems layer for RL at scale: distributed rollouts, training orchestration, inference, evals, data pipelines, observability, and reliability. You will create the durable platform that enables researchers and applied ML engineers to run, debug, and reproduce large-scale RL experiments.
Responsibilities
  • Build infrastructure for distributed RL training and inference across thousands of GPUs
  • Improve the reliability, debuggability, and throughput of RL experiments.
  • Build interfaces that allow researchers and applied ML engineers to launch, inspect, compare, and reproduce experiments easily.
  • Own infrastructure projects end to end, from architecture and implementation through deployment, documentation, and long-term maintenance.
  • Identify and eliminate bottlenecks in training, rollout generation, eval execution, data movement, and cluster utilization.
  • Maintain engineering standards for RL infrastructure, including testing, observability, versioning, and reproducibility.
Minimum Requirements
  • Strong software engineering experience.
  • Experience building infrastructure for LLM inference and/or RL training.
  • Experience with GPU clusters, distributed training, model serving, or high-throughput inference systems.
  • Familiarity with vLLM, SGLang and modern LLM-RL training frameworks
  • Strong understanding of system reliability, observability, testing, debugging, and performance optimization.
  • Ability to work closely with ML researchers and translate messy experimental workflows into durable infrastructure.
  • Experience building tools, platforms, or services used by other technical users.
  • Strong judgment around technical tradeoffs: when to prototype, when to harden, when to simplify, and when to redesign.
  • Clear written and verbal communication, especially around system design, operational risks, and engineering tradeoffs.
Nice to have
  • Experience supporting research teams or fast-moving ML teams.
  • Experience at a high engineering bar organization where reliability, ownership, and code quality were central.
  • Evidence of strong independent technical work, such as open-source projects, infrastructure projects, competitions, or substantial systems built from scratch.
  • Experience reducing operational complexity in systems that had become brittle, slow, or hard to debug.
Role specific location policy
  • This role is based in our San Francisco office; for exceptional candidates we are willing to consider a hybrid arrangement
Compensation

The expected salary range for this position is $300,000 - $500,000 USD

Similar Jobs

More Technical Services Jobs

Find similar Member of Technical Staff - RL Infrastructure jobs: