MLE SpeechLLM Evaluations

DEEPREC.AI

$250K — $350K *
Consumer Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of Python development experience in production settings
  • Proven track record in building machine learning evaluation systems
  • Strong knowledge of statistical analysis and model performance metrics
  • Exceptional communication skills to present findings to diverse audiences
  • Familiarity with speech evaluation metrics is a plus

Responsibilities

  • Develop frameworks to assess speech and conversational AI quality
  • Set benchmarks for audio quality, transcription accuracy, and dialogue effectiveness
  • Build automated evaluation pipelines to streamline model training
  • Create and maintain dashboards to monitor model performance
  • Collaborate with researchers to translate AI capabilities into measurable metrics
  • Analyze performance anomalies during training phases
  • Enhance the efficiency and reliability of evaluation processes throughout the research lifecycle

Benefits

  • Key role in an innovative Speech LLM team
  • Complete ownership of evaluation strategy and technical direction
  • Engagement with diverse challenges across ML, statistics, and research
  • Robust healthcare benefits and 401(k) matching
  • Generous parental leave and unlimited paid time off
  • Funding for conferences and continuous learning opportunities
  • Opportunity to shape products for a worldwide audience
Full Job Description
Machine Learning Engineer, SpeechLLM Evaluations
$250,000 - $350,000 bonus equity
San Francisco, CA. Hybrid (3 days onsite)
Full-time / Permanent

This is a chance to define how their foundational Speech LLMs are measured, improved, and trusted. If you've ever felt model evaluation deserves the same attention as model training, you'll have the space to prove it here.

The Opportunity

You'll join an early Speech LLM team where your work shapes research decisions, product quality, and model releases. You'll own the systems that answer one of the hardest questions in AI: how do you measure something as human as conversation, expression, and understanding?

What You'll Do

- Build evaluation frameworks for speech and conversational AI models
- Define benchmarks for transcription, audio quality, and dialogue performance
- Create automated evaluation pipelines for training checkpoints
- Own dashboards that surface model health and regressions
- Partner with researchers to translate capabilities into measurable outcomes
- Investigate unexpected performance changes during model training
- Improve evaluation speed, quality, and reliability across the research lifecycle

What You'll Bring

Essential

- Strong Python engineering experience in production environments
- Experience building ML evaluation, data, or experimentation systems
- Deep understanding of statistics, benchmarking, and model performance analysis
- Ability to explain technical findings to varied audiences

Desirable

- Experience with speech metrics such as WER, CER, PESQ, or MOS
- Familiarity with LLM-as-a-Judge evaluation methods
- Experience with ML observability tools such as Weights & Biases or MLflow

*We encourage you to apply even if you don't meet every requirement. The right mindset matters as much as the right CV.*

What's In It For You

- Foundational role within a growing Speech LLM research team
- Ownership of evaluation strategy and technical direction
- Work on problems spanning ML, statistics, software, and research
- Comprehensive healthcare, 401(k) matching, parental leave, and unlimited PTO
- Conference, learning, and career development support
- Direct influence on products used by a global customer base

Similar Jobs

More Jobs at DEEPREC.AI

  • Senior ASR Engineer
    $200K — $250K *
    San Francisco, CA 94112 (San Francisco County)
    Healthcare
    In-Person
  • MLE SpeechLLM Evaluations
    $250K — $350K *
    San Francisco, CA 94112 (San Francisco County)
    Consumer Technology
    In-Person
  • Multimodal LLM Researcher
    $300K — $400K *
    Palo Alto, CA 94303 (Santa Clara County)
    Information Technology
    In-Person

More Consumer Technology Jobs

Find similar MLE SpeechLLM Evaluations jobs: