Machine Learning Engineer - Speech Model Training $250,000 - $300,000 San Francisco, CA Hybrid, 3x per week in office Full time / PermanentIn this role you won't be wrapping APIs or fine-tuning existing models. You'll be building models across raw acoustic signal processing all the way through to production inference on edge devices. At a company that actually ships to 1.5M live users.
They build a hardware-software AI companion used daily by over 1.5 million professionals worldwide. The next chapter is a world-class speech intelligence core and they need the engineers to architect it.
What you'd own:
- Design and train large-scale speech models end-to-end. Unified SpeechLLMs, ASR, expressive TTS, generative audio
- Own the full stack from acoustic feature engineering to GPU cluster optimisation
- Run and optimise distributed training at scale via PyTorch or JAX, FSDP, DeepSpeed, etc
- Drive real-time inference performance with vLLM, TensorRT-LLM, or SGLang
- Apply RL alignment techniques to improve conversational quality
- Debug the hard problems in distributed infrastructure and ship solutions
You likely have:
- Proven experience training large-scale audio or speech models from the ground up
- Deep PyTorch or JAX expertise with real distributed training experience
- Genuine comfort traversing the entire ML stack from signal processing to production
- A bias toward shipping: you take ownership, you iterate fast
Strong bonus: neural audio codecs, diffusion/flow-matching architectures, or LLM pretraining experience.
Why join- Profitable company at ~$250M run rate - you'll see the impact of your work immediately in a product used daily by professionals worldwide
- Direct ownership of the live speech quality stack, not a supporting role in a large org
- Hybrid San Francisco team with real access to large, diverse, multilingual audio datasets
- Short feedback loops - improvements ship fast and metrics are visible
- Clear path toward senior technical leadership as the audio team grows