Research Scientist

Applied Compute

$120K — $160K *
Enterprise Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Experience with large-scale GPU systems and training jobs
  • Curiosity to delve deep into the training stack
  • Focus on swift implementation while ensuring reliability
  • Familiarity with open-weights models
  • Background in reinforcement learning or related inference integration

Responsibilities

  • Design and optimize RL training and inference pipelines for GPU clusters
  • Build tooling for inspecting, profiling, and debugging training runs
  • Implement systems with attention to ML training factors
  • Collaborate with researchers to integrate post-training capabilities into production

Benefits

  • Unlimited PTO
  • Paid parental leave
  • Daily lunches and dinners
  • Transportation and relocation support
  • Retirement plans
  • Generous health benefits
Full Job Description
The role

As a research scientist, you will design, implement, and optimize the large-scale training infrastructure that powers our frontier reinforcement learning stack. This is systems work at the edge of what's possible, training state-of-the-art models for our enterprise partners. Frontier systems are exciting but brittle, and require both performance and correctness to train models effectively. You'll work closely with researchers to make our RL stack reliable, fast, and capable of running for days without intervention.

What you'll do
  • Design and optimize our RL training and inference pipelines across large GPU clusters
  • Build tooling and observability that lets researchers and customers inspect, profile, and debug training runs
  • Implement systems with an eye toward how they affect ML (low precision numerics, distributed training edge cases, etc.)
  • Partner with researchers to bring frontier post-training capabilities into production deployments

What we're looking for
  • Experience programming with and managing training jobs on large-scale GPU systems
  • Fearlessness and curiosity to understand all levels of the training stack
  • Bias toward fast implementation, paired with a high bar for reliability and efficiency
  • Familiarity with open-weights models (architecture and inference)
  • Background in reinforcement learning or integration of inference with RL training loops

Strong candidates also have
  • Experience with distributed training frameworks (PyTorch, JAX, DeepSpeed)
  • Background in high-performance computing or working with large-scale clusters
  • Contributions to open-source ML infrastructure
  • Demonstrated technical creativity through published projects, OSS contributions, or side projects


Benefits & Logistics

This role is based in San Francisco. We work from our office in the Mission. We offer:
  • Competitive compensation and equity
  • Generous health benefits
  • Unlimited PTO
  • Paid parental leave
  • Daily lunches and dinners
  • Transportation and relocation support
  • Retirement plans

We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the process with you. We encourage you to apply even if you do not believe you meet every single qualification.

Similar Jobs

More Jobs at Applied Compute

More Enterprise Technology Jobs

Find similar Research Scientist jobs: