Member of Technical Staff - ML Training Systems

Modal, Inc

$130K — $180K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience in writing high-performance code
  • Proficient with torch and high-level training frameworks like Huggingface, verl, slime
  • Demonstrated experience in optimizing ML training processes
  • Familiarity with low-level operating systems (Linux kernel, file systems, containers) is a plus
  • Willingness to work in-person at NYC or San Francisco office

Responsibilities

  • Develop and implement efficient training workflows for production machine learning models
  • Collaborate on open-source projects to enhance Modal's infrastructure
  • Optimize infrastructure for training language models
  • Identify and eliminate bottlenecks in data loading and communication processes
  • Contribute to architectural decisions affecting ML model performance

Benefits

  • Opportunity for growth within a rapidly expanding team
  • Work alongside a diverse team of experts, including open-source contributors and academic researchers
  • Access to cutting-edge technologies and tools in AI
  • Be part of a fast-paced startup environment with significant company momentum
  • Engage in collaborative projects that impact major customers in the AI space
Full Job Description
About Us:

Modal provides the infrastructure foundation for AI teams. With instant GPU access, sub-second container startups, and native storage, Modal makes it simple to train models, run batch jobs, and serve low-latency inference. Companies like Suno, Lovable, and Substack rely on Modal to move from prototype to production without the burden of managing infrastructure.

We're a fast-growing team based out of NYC, SF, and Stockholm. We've hit 9-figure ARR and recently raised a Series B at a $1.1B valuation. We have thousands of customers who rely on us for production AI workloads, including Lovable, Scale AI, Substack, and Suno.

Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.

The Role:

We are looking for strong engineers with experience training production machine learning models. If you are interested in contributing to open-source projects and evolving Modal's infrastructure to train the next generation of language models, we'd love to hear from you!

Requirements:
  • 5+ years of experience writing high-quality, high-performance code.
  • Experience working with torch and high-level training frameworks (Huggingface, verl, slime)
  • Experience with ML training optimization (tell us a story about eliminating data loading bottlenecks, overlapping communications with compute, rewriting a trainer to handle off-policy rollouts, etc.)
  • Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).
  • Ability to work in-person, in our NYC or San Francisco office.

Similar Jobs

More Jobs at Modal, Inc

More Information Technology Jobs

Find similar Member of Technical Staff - ML Training Systems jobs: