Data Machine Learning Engineer

techire ai

$100K — $150K *
US-AnywhereRemote in United States
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in building ML data pipelines at scale
  • Hands-on expertise with speech or audio data
  • Strong understanding of various speech representations
  • Familiarity with multi-channel audio data processing including diarisation and alignment
  • Experience with multilingual data is a plus

Responsibilities

  • Own end-to-end data pipelines from audio ingestion to training-ready datasets
  • Build quality assurance systems to identify annotation errors before training
  • Maintain training infrastructure to optimize GPU usage
  • Develop and refine tools for various speech representations
  • Handle complex audio pipeline tasks like two-channel alignment

Benefits

  • Remote-friendly work environment
  • Competitive base salary
  • Stock options available
Full Job Description
Job Description

Want to own the data infrastructure behind some of the most naturalistic voice models in production?

You'll be joining a well-funded speech AI startup - just closed their Series A - with strong enterprise traction and revenue that more than doubled last quarter. They're building ultra-realistic voice technology that handles natural laughter, breathing, seamless language switching, and accurate pronunciation across languages and accents. Their models are powering hundreds of millions of conversations monthly.

Before training a single model, they built their own corpus - full-duplex, studio-quality conversational speech annotated by PhD linguists. As their MLE, you'll own the pipelines that turn that raw material into clean, training-ready data.

What you'll do
  • Own end-to-end data pipelines from raw audio ingestion through to versioned, training-ready datasets
  • Build quality systems that catch annotation errors and alignment issues before they reach a training run
  • Maintain the training infrastructure that keeps GPUs fed - dataloaders, streaming datasets, multi-modal batching
  • Build and iterate on tooling across speech representations including neural codecs, semantic tokens and mel features
  • Handle full- and half-duplex pipeline work including two-channel alignment and overlap handling

What you'll bring
  • Strong engineering fundamentals with experience building ML data pipelines at scale
  • Hands-on experience with speech or audio data
  • Solid understanding of speech representations and the tradeoffs between them
  • Experience with multi-channel audio data including diarisation and alignment

Nice to have
  • Experience with multilingual data pipelines
  • Large-scale training infrastructure experience - FSDP, DeepSpeed, Ray
  • Annotation tooling and human-in-the-loop systems

Remote-friendly. Competitive base plus stock.

Similar Jobs

More Jobs at techire ai

More Information Technology Jobs

Find similar Data Machine Learning Engineer jobs: