OpenAI

Software Engineer, RL Training Infra

OpenAI$130K — $180K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in machine learning infrastructure or related fields
  • Strong debugging skills essential for troubleshooting complex systems
  • Ability to learn quickly across various technology layers
  • Experience in reinforcement learning or related ML infrastructure
  • Excellent communication and ownership attributes

Responsibilities

  • Ensure smooth operation of large-scale RL training systems by addressing urgent technical issues
  • Troubleshoot and fix problems in training, inference, and orchestration systems
  • Enhance the reliability of training runs and improve overall system efficiency
  • Support researchers in developing complex integrations like memory or multi-agent capabilities
  • Transform recurring operational challenges into systematic solutions
  • Collaborate closely with research and infrastructure teams on model timelines
  • Adapt quickly to ambiguous situations and take ownership of projects

Benefits

  • Opportunity to work with leading-edge AI technologies
  • Collaborative and high-impact work environment
  • Access to professional development and learning opportunities
  • Ability to influence the direction of frontier model training
  • Work within a team that values speed, reliability, and innovation
Full Job Description
About the Team

The Post-Training Frontiers team creates the frontier agents OpenAI ships to the world. We do the reinforcement learning training for the agentic models we ship in Codex, ChatGPT, and the API (from o1 to 5.5).

Our role consists of (1) shepherding all integrations that should go into the final RL run and deciding what can make it in, (2) babysitting and scaling the final run, and (3) building the research and infra for horizontal integrations, such as improving function calling, factuality, multi-agent capabilities, memory, calibrated thinking, etc.

About the Role

This role focuses on keeping our frontier RL training runs fast, reliable, and unblocked. You will work across engineering and infrastructure problems as they emerge, from scaling and orchestration issues to inference bottlenecks, numerical problems, and hardware failures, as well as supporting large horizontal integrations in the big run, like multi-agent capabilities or memory. This is a role for a strong generalist who quickly learns anything needed for the task, has high attention to detail, debugs deeply, and is motivated by fixing the highest-impact problem in front of the team.

In this role, you will:

- Keep large-scale RL training runs moving by jumping into the most urgent engineering and infrastructure problems.

- Debug issues across training systems, inference, orchestration, scaling, and distributed infrastructure.

- Solve hard technical problems at the boundary between research and engineering: scaling experiments, improving training reliability, debugging distributed systems, reducing latency and cost, and making new capabilities robust under real workloads.

- Improve reliability and efficiency for RL training runs.

- Help researchers who are developing infra-heavy integrations, such as multi-agent capabilities or memory.

- Turn recurring operational issues into better tools, systems, processes, or abstractions.

- Work closely with research, infrastructure, and partner teams during tight model run timelines.

- Become useful quickly in messy, ambiguous areas where ownership matters more than a perfectly scoped project.

- Debug failures that cut across model behavior, training data, RL systems, evaluation infrastructure, serving systems, and agent harnesses, then turn those failures into hypotheses, fixes, and durable improvements.

You might thrive in this role if you:

- Want to train and ship our frontier models and ensure we make agents genuinely useful for developers, enterprises, researchers, and everyday users.

- Are a strong generalist engineer with experience in some layer of ML infrastructure.

- Have worked on RL, inference, scaling, training systems, orchestration, or adjacent ML infrastructure.

- Learn extremely quickly and are comfortable operating across unfamiliar layers.

- Are a strong debugger with high ownership, low ego, and excellent communication.

- Can land in a messy area with tight timelines, become useful quickly, and gradually raise the quality of the whole system.

- Are energized by fast-moving environments where reliability, speed, and judgment matter.

- Like building load-bearing systems and processes when that is what the team needs, even if the work is not glamorous.

Nice to have:

- Experience supporting large-scale model training, async RL systems, or high-throughput ML infrastructure.

- Experience debugging distributed systems across GPUs, networking, orchestration, or inference stacks.

- Background in performance optimization, scaling, or production-critical infrastructure.

- Experience working directly with researchers or fast-moving model teams.

About OpenAI

OpenAI is an artificial intelligence research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company was founded in 2015 by a group of technology leaders, including Elon Musk, Sam Altman, Greg Brockman, Ilya Sutskever, and John Schulman. OpenAI's mission is to develop and promote friendly AI for the betterment of humanity. The company has developed a number of cutting-edge AI technologies, including GPT-3, a language processing system that can generate human-like text. OpenAI has received funding from a number of high-profile investors, including LinkedIn co-founder Reid Hoffman and venture capitalist Peter Thiel.
Learn more about OpenAI
Size
100 employees
Industry
Founded
2015

Similar Jobs

More Jobs at OpenAI

More Information Technology Jobs

Find similar Software Engineer, RL Training Infra jobs: