AI Researcher / ML Engineer (ASR & Speech Specialist)

Lilt • $120K — $150K *

Washington, DC 20011In-Person

Consumer Technology

Less than 5 years of experience

3 weeks ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Master's or Ph.D. in Computer Science, Electrical Engineering, Computational Linguistics, Data Science or a related field.
3-5 years of experience developing Automatic Speech Recognition (ASR) systems.
Proficiency with deep learning frameworks like PyTorch and specialized speech toolkits.
Experience running PyTorch models on mobile inference runtimes such as ExecuTorch or TensorFlow Lite.
Strong software engineering skills in Python and understanding of complex multilingual tokenization.
Familiarity with large-scale audio datasets and data augmentation techniques.

Responsibilities

Architect, train, and evaluate advanced ASR models across multiple languages.
Design scalable algorithms for dynamic vocabulary insertion and customer-specific terminology.
Implement automated evaluations to benchmark model performance against established metrics.
Develop multilingual benchmarks for end-to-end conversational AI agents.
Collaborate with teams to build and optimize high-throughput speech processing systems.
Refine components of the speech processing pipeline, ensuring high performance.
Translate product requirements into actionable AI technical roadmaps.

Benefits

Opportunity to work on cutting-edge AI and speech technologies.
Collaborative cross-functional team environment.
Access to ongoing professional development and training resources.
Flexibility in work arrangements, promoting a healthy work-life balance.

Full Job Description

Role Summary

We are seeking a highly skilled and visionary Senior AI Researcher / Machine Learning Engineer specializing in Automatic Speech Recognition (ASR) to anchor our core speech intelligence and benchmarking initiatives. In this role, you will serve as our principal subject matter expert in AI speech data processing, responsible for architecting, training, and scaling high-performance, multilingual ASR models, as well as developing rigorous quality benchmarks for agentic conversational AI.

A critical component of this position involves developing robust domain-adaptation frameworks that allow our models to dynamically incorporate proprietary customer terminology, specialized industry jargon, and multilingual nuances. You will collaborate with the Engineering, Product, and AI Research teams to transform state-of-the-art speech research into production-ready systems powering on-device real-time streaming translation and novel frontier model benchmarks.

Key Challenge: Scaling ASR models capable of dynamic vocabulary insertion for enterprise-grade, ultra-low-latency, real-time environments, and end-to-end agentic AI benchmarking that goes beyond surface metrics.

Key Responsibilities

Model Development & Innovation: Architect, train, fine-tune, and evaluate state-of-the-art speech representations and ASR models (e.g., End-to-End Conformer, Whisper, RNN-T, and hybrid CTC/Attention architectures) across multiple global languages.
Customization & Domain Adaptation: Design and deploy highly scalable algorithms for dynamic vocabulary insertion, contextual biasing, and language model (LM) personalization to precisely capture customer-specific terminology, acronyms, and product names.
Evaluation: Implement automated framework evaluations to benchmark model performance, rigorously tracking Word Error Rate (WER), Character Error Rate (CER), embedding-based metrics, latency budgets (RTF), and computing efficiency profiles under varying acoustic environments.
Agentic Benchmarking: Develop pioneering multilingual benchmarks for end-to-end conversational AI agents, including speech-to-text and text-to-speech components, and targeting the weaknesses of state-of-the-art frontier models.
Real-Time & Batch Speech Systems: Partner with core engineering teams to build, optimize, and maintain high-throughput pipelines optimized for both ultra-low latency real-time streaming inference and high-efficiency asynchronous (batch) multi-channel speech analysis.
Speech Pipeline Engineering: Develop and refine standard auxiliary components of the speech processing chain, including Voice Activity Detection (VAD), speaker diarization, punctuation restoration, noise/acoustic normalization, and audio pre-processing filters.
Cross-Functional Productization: Translate product requirements into technical AI roadmaps, working hand-in-hand with Product Managers to ship speech-to-text, simultaneous translation, and semantic speech analytics features.

Required Technical Qualifications

Education: Master's or Ph.D. degree in Computer Science, Electrical Engineering, Computational Linguistics, Data Science, or a related quantitative field with an emphasis on speech processing or deep learning (or equivalent proven industry track record).
Speech Domain Expertise: Minimum of 3-5 years of dedicated professional experience developing ASR systems, speech-to-text translation pipelines, or advanced audio processing models.
Deep Learning Frameworks: Advanced proficiency with PyTorch or equivalent frameworks, along with extensive experience utilizing dedicated speech toolkits such as Whisper, NVIDIA NeMo, Hugging Face Transformers, Kaldi, ESPnet, or SpeechBrain.
On-device runtimes: Hands-on experience converting and running PyTorch models on at least one mobile inference runtime: ExecuTorch, LiteRT (formerly TensorFlow Lite), or ONNX Runtime Mobile. You have personally taken a non-trivial model through conversion, including resolving unsupported operations and dynamic-shape or decoder-loop issues.
Software & Infrastructure: Strong software engineering principles in Python, with a clear understanding of data structures, algorithm optimization, and handling complex multilingual text/audio tokenization schemas.
Data Pipeline Mastery: Proven experience working with large-scale audio datasets, audio augmentation techniques (e.g., SpecAugment, noise injection), and text normalization/inverse text normalization (ITN) pipelines.

Preferred & Specialization Qualifications

High-Performance and on-device Inference: Experience optimizing models for constrained on-device and production environments using quantization (INT4/INT8/FP16), distillation, ONNX Runtime, TensorRT, or Triton Inference Server.
Research Footprint: Peer-reviewed publications in premier speech and machine learning conferences (e.g., ICASSP, INTERSPEECH, NeurIPS, ICLR, ACL) are a strong plus, or an active contribution footprint to open-source speech communities.
Hardware acceleration: Working knowledge of mobile NPU/DSP acceleration on the Android SoC landscape (Qualcomm QNN / Hexagon, GPU, and NNAPI delegates) and the trade-offs across Snapdragon, MediaTek, and Google Tensor.
Streaming Architectures: Deep technical familiarity with streaming neural architectures (e.g., block-processing, streaming transformers, or transducer models) and real-time network transport constraints (WebSockets, gRPC).
Multilingual Engineering: Professional exposure to building zero-shot multilingual speech systems or managing cross-lingual acoustic phonology data.

Core Competencies & Soft Skills

Analytical Problem Solving: Ability to break down ambiguous business or product requirements into deterministic, actionable machine learning experimentation frameworks.
Collaborative Communication: Strong capability to communicate intricate technical machine learning complexities to non-technical stakeholders across product, design, and executive leadership.
Ownership Mindset: Comfortable working in a fast-paced environment, taking accountability from initial algorithmic hypothesis and exploratory research through to final production monitoring.

About Lilt

Learn more about Lilt

Industry

Business Services

Founded

2015

* Ladders Estimates

Similar Jobs

Computer Vision AI Engineer
$99K — $225K *
TeleTech
Mclean, VA 22101 (Fairfax County)
Today
Computer Vision AI Engineer
$99K — $225K *
TeleTech
Chantilly, VA 20152 (Loudoun County)
Today
GenAI Engineer
$113K — $188K *
Deloitte
Arlington, VA 22204 (Arlington County)
Today
Delivery Consultant- AI/ML, WWPS ProServe Delivery Team
$131K — $177K *
Amazon
Arlington, VA 22204 (Arlington County)
Today
Delivery Consultant- AI/ML, WWPS ProServe
$131K — $177K *
Amazon
Arlington, VA 22204 (Arlington County)
Today
Senior Software Engineer - Remote
$91K — $163K *
UnitedHealth Group
Remote
Yesterday

Get Ready For Your
Next Interview

More Jobs at Lilt

AI Researcher / ML Engineer (ASR & Speech Specialist)
$120K — $150K *
Boston, MA 02115 (Suffolk County)
3 weeks ago
Information Technology
In-Person
AI Researcher / ML Engineer (ASR & Speech Specialist)
$120K — $150K *
Indianapolis, IN 46227 (Marion County)
3 weeks ago
Information Technology
In-Person
AI Researcher / ML Engineer (ASR & Speech Specialist)
$120K — $150K *
Washington, DC 20011 (District Of Columbia County)
3 weeks ago
Consumer Technology
In-Person
Android Application Engineer
$90K — $130K *
Indianapolis, IN 46227 (Marion County)
3 weeks ago
Consumer Technology
In-Person
Enterprise Account Executive
$120K — $180K *
Boston, MA 02115 (Suffolk County)
1 month ago
Enterprise Technology
In-Person

More Consumer Technology Jobs

Chief Product and Innovation Officer
$250K — $400K *
Blueair
New York, NY 10007 (New York County)
Reposted Today
Product Manager
$120K — $150K *
Clipboard Health
San Francisco, CA 94112 (San Francisco County)
Today
Head of Global Social Media
$130K — $180K *
Western Digital Technologies
Irvine, CA 92620 (Orange County)
Today
Growth Marketing Director (Remote)
$138K — $179K *
Cengage Learning
Remote
Reposted Today
Administrative Coordinator, Global E-Commerce
$80K — $98K *
TikTok
Seattle, WA 98115 (King County)
Today

Find similar AI Researcher / ML Engineer (ASR & Speech Specialist) jobs:

Nationwide Washington, DC

AI Researcher / ML Engineer (ASR & Speech Specialist)

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar AI Researcher / ML Engineer (ASR & Speech Specialist) jobs:

Get Ready For Your
Next Interview