Machine Learning Engineer - Speech Model Training

DEEPREC.AI

$250K — $300K *
Consumer Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years experience in machine learning, specifically in speech or audio processing
  • Proficient in PyTorch or JAX with a focus on distributed training
  • Ability to navigate the full machine learning stack from signal processing to deployment
  • Strong sense of ownership with a focus on shipping solutions quickly
  • Experience optimizing large-scale models and architectures, ideally with live user feedback

Responsibilities

  • Design and build large-scale speech models from inception to production
  • Oversee full stack processes, including acoustic feature engineering and infrastructure optimization
  • Optimize distributed training processes using frameworks like PyTorch or JAX
  • Enhance real-time performance capabilities through advanced inference methods
  • Implement reinforcement learning techniques to enhance conversational AI
  • Troubleshoot complex issues within distributed systems and implement effective solutions

Benefits

  • Work on a product that directly impacts 1.5 million users daily
  • Take on direct ownership of the speech quality stack, rather than a supporting role
  • Access large and diverse multilingual datasets for model training
  • Short feedback cycles for rapid iteration and visible impact
  • Clear career progression opportunities toward senior leadership roles in the audio team
Full Job Description
Machine Learning Engineer - Speech Model Training
$250,000 - $300,000
San Francisco, CA
Hybrid, 3x per week in office
Full time / Permanent

In this role you won't be wrapping APIs or fine-tuning existing models. You'll be building models across raw acoustic signal processing all the way through to production inference on edge devices. At a company that actually ships to 1.5M live users.

They build a hardware-software AI companion used daily by over 1.5 million professionals worldwide. The next chapter is a world-class speech intelligence core and they need the engineers to architect it.

What you'd own:
  • Design and train large-scale speech models end-to-end. Unified SpeechLLMs, ASR, expressive TTS, generative audio
  • Own the full stack from acoustic feature engineering to GPU cluster optimisation
  • Run and optimise distributed training at scale via PyTorch or JAX, FSDP, DeepSpeed, etc
  • Drive real-time inference performance with vLLM, TensorRT-LLM, or SGLang
  • Apply RL alignment techniques to improve conversational quality
  • Debug the hard problems in distributed infrastructure and ship solutions

You likely have:
  • Proven experience training large-scale audio or speech models from the ground up
  • Deep PyTorch or JAX expertise with real distributed training experience
  • Genuine comfort traversing the entire ML stack from signal processing to production
  • A bias toward shipping: you take ownership, you iterate fast
Strong bonus: neural audio codecs, diffusion/flow-matching architectures, or LLM pretraining experience.

Why join
  • Profitable company at ~$250M run rate - you'll see the impact of your work immediately in a product used daily by professionals worldwide
  • Direct ownership of the live speech quality stack, not a supporting role in a large org
  • Hybrid San Francisco team with real access to large, diverse, multilingual audio datasets
  • Short feedback loops - improvements ship fast and metrics are visible
  • Clear path toward senior technical leadership as the audio team grows

Similar Jobs

More Jobs at DEEPREC.AI

More Consumer Technology Jobs

Find similar Machine Learning Engineer - Speech Model Training jobs: