Scope of the Role: Innodata produces the voice and audio datasets behind the world's leading speech AI. We're hiring an Audio Engineer to own the technical heart of that work: the signal chain, the post-processing recipe, the technical specifications, and the consistent acoustic quality - the "sound signature" - that defines an Innodata dataset.
You'll ensure every hour of audio we deliver, across dozens of languages and recording conditions, meets a precise and consistent technical bar. You'll design the recipe, build the validation, and continuously push the quality and efficiency of how we capture and process audio.
What You'll Own:- Own the end-to-end audio signal chain and post-processing pipeline for all collection programs.
- Define and document technical specifications: sample rates, bit depth, formats, loudness (LUFS) targets, noise floors, channel configurations.
- Design and maintain the "Innodata sound signature" - a consistent, spec-compliant acoustic profile across studio, remote, real-world, and telephonic captures
- Build technical QA: automated and manual checks that validate audio against spec before delivery.
- Specify and validate recording setups for vendors and remote contributors (signal-chain testing in a small in-house studio).
- Partner with the Solutions Architect to translate customer acoustic requirements into achievable technical recipes.
- Drive tooling: help select and configure recording/QA/processing tools; automate where possible.
- Troubleshoot acoustic and signal issues across diverse capture environments.
You'll Thrive in This Role If You Have:- Strong audio engineering background: signal chain, recording, post-processing, mastering, and acoustic QA.
- Deep fluency in audio technical specs (sample rate, bit depth, LUFS, formats, codecs) and the ability to define and enforce them.
- Experience producing consistent audio quality across varied recording conditions and locations.
- Comfort with audio tooling and automation (scripting for batch processing/QA is a strong plus).
- Precision and process orientation - you care about consistency at scale, not just one great recording.
- Experience with speech/voice data for AI/ML (TTS or ASR datasets).
- Familiarity with multilingual recording and remote/distributed capture.
- Knowledge of speech quality metrics and how acoustic choices affect downstream model performance.
- Scripting (Python) for audio processing pipelines (e.g., ffmpeg, sox, pydub, librosa).
The expected salary range for this position is $120,000 - $160,000 USD per year, based on experience, skills, and qualifications.