MLE SpeechLLM Evaluations

DEEPREC.AI

• $250K — $350K *

San Francisco, CA 94112In-Person

Consumer Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years of Python development experience in production settings
Proven track record in building machine learning evaluation systems
Strong knowledge of statistical analysis and model performance metrics
Exceptional communication skills to present findings to diverse audiences
Familiarity with speech evaluation metrics is a plus

Responsibilities

Develop frameworks to assess speech and conversational AI quality
Set benchmarks for audio quality, transcription accuracy, and dialogue effectiveness
Build automated evaluation pipelines to streamline model training
Create and maintain dashboards to monitor model performance
Collaborate with researchers to translate AI capabilities into measurable metrics
Analyze performance anomalies during training phases
Enhance the efficiency and reliability of evaluation processes throughout the research lifecycle

Benefits

Key role in an innovative Speech LLM team
Complete ownership of evaluation strategy and technical direction
Engagement with diverse challenges across ML, statistics, and research
Robust healthcare benefits and 401(k) matching
Generous parental leave and unlimited paid time off
Funding for conferences and continuous learning opportunities
Opportunity to shape products for a worldwide audience

Full Job Description

Machine Learning Engineer, SpeechLLM Evaluations
$250,000 - $350,000 bonus equity
San Francisco, CA. Hybrid (3 days onsite)
Full-time / Permanent

This is a chance to define how their foundational Speech LLMs are measured, improved, and trusted. If you've ever felt model evaluation deserves the same attention as model training, you'll have the space to prove it here.

The Opportunity

You'll join an early Speech LLM team where your work shapes research decisions, product quality, and model releases. You'll own the systems that answer one of the hardest questions in AI: how do you measure something as human as conversation, expression, and understanding?

What You'll Do

- Build evaluation frameworks for speech and conversational AI models
- Define benchmarks for transcription, audio quality, and dialogue performance
- Create automated evaluation pipelines for training checkpoints
- Own dashboards that surface model health and regressions
- Partner with researchers to translate capabilities into measurable outcomes
- Investigate unexpected performance changes during model training
- Improve evaluation speed, quality, and reliability across the research lifecycle

What You'll Bring

Essential

- Strong Python engineering experience in production environments
- Experience building ML evaluation, data, or experimentation systems
- Deep understanding of statistics, benchmarking, and model performance analysis
- Ability to explain technical findings to varied audiences

Desirable

- Experience with speech metrics such as WER, CER, PESQ, or MOS
- Familiarity with LLM-as-a-Judge evaluation methods
- Experience with ML observability tools such as Weights & Biases or MLflow

*We encourage you to apply even if you don't meet every requirement. The right mindset matters as much as the right CV.*

What's In It For You

- Foundational role within a growing Speech LLM research team
- Ownership of evaluation strategy and technical direction
- Work on problems spanning ML, statistics, software, and research
- Comprehensive healthcare, 401(k) matching, parental leave, and unlimited PTO
- Conference, learning, and career development support
- Direct influence on products used by a global customer base

* Ladders Estimates

Similar Jobs

Machine Learning Engineer
$170K — $315K *
Intel
Santa Clara, CA 95051 (Santa Clara County)
2 days ago
Machine Learning Engineer
$170K — $315K *
Intel
Folsom, CA 95630 (Sacramento County)
2 days ago
Machine Learning Engineer (Multiple Positions)
$174K — $316K *
TikTok
San Jose, CA 95123 (Santa Clara County)
1 week ago
Machine Learning Engineer, TikTok BRIC Account Security
$156K — $316K *
TikTok
San Jose, CA 95123 (Santa Clara County)
Reposted 2 weeks ago
Machine Learning Engineer, TikTok Brand Ads
$156K — $316K *
TikTok
San Jose, CA 95123 (Santa Clara County)
3 weeks ago
Machine Learning Engineer, E-commerce Feed Recommendation
$156K — $316K *
TikTok
San Jose, CA 95123 (Santa Clara County)
Reposted 3 weeks ago

Get Ready For Your
Next Interview

More Jobs at DEEPREC.AI

Senior ASR Engineer
$200K — $250K *
San Francisco, CA 94112 (San Francisco County)
Today
Healthcare
In-Person
MLE SpeechLLM Evaluations
$250K — $350K *
San Francisco, CA 94112 (San Francisco County)
Today
Consumer Technology
In-Person
Multimodal LLM Researcher
$300K — $400K *
Palo Alto, CA 94303 (Santa Clara County)
2 weeks ago
Information Technology
In-Person

More Consumer Technology Jobs

Senior Staff Software Engineer - Mobile(Android)
$184K — $210K *
Capital One Financial Corporation
Toronto, ON M3C 0E3
Today
New Channels Product Manager
$100K — $130K *
Malvern, PA 19355 (Chester County)
Reposted Today
iOS Senior Developer, OTT Team (French Services) (Telework)
$90K — $120K *
Montreal, QC H1A 0A1
Reposted Today
Senior Android Developer (English Services) (Telework/Hybrid)
$90K — $120K *
CBC/Radio-Canada
Toronto, ON M3C 0E3
Today
Field Sales Representative
$61K — $100K *
AT&T
Lombard, IL 60148 (Dupage County)
Today

Find similar MLE SpeechLLM Evaluations jobs:

Nationwide San Francisco, CA

MLE SpeechLLM Evaluations

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar MLE SpeechLLM Evaluations jobs:

Get Ready For Your
Next Interview