About the Role We're hiring a Member of Technical Staff (Real-Time Audio) to join our Product Engineering team. Hark's voice agent holds real-time, full-duplex conversations with people in homes, cars, and noisy rooms. That experience is only as good as the audio underneath it.
This role owns the real-time audio that makes conversations feel natural (echo cancellation, noise suppression, and voice activity detection) as production code in our live client. This is not a research role and not a DSP theory role. We're looking for someone who can do both: understand the signal processing and ship the code.
Responsibilities- Own audio quality on the client: echo, self-interruption, dropouts, and clipping
- Build and tune the browser audio pipeline with the Web Audio API, AudioWorklet, and getUserMedia constraints
- Work the WebRTC audio path end to end: AEC, noise suppression, and VAD
- Ship DSP to the client as C++/Rust compiled to WebAssembly, and as TypeScript in the audio pipeline
- Tune endpointing, interruption, and turn-taking so the agent listens like a person
- Reduce conversational latency and artifacts across the streaming pipeline
- Work in our React/TypeScript client where audio meets the UI
- Manage features end-to-end from prototyping through production
- Collaborate with designers, platform engineers, and our speech team.
Requirements- 5+ years of software engineering experience
- Shipped real-time audio into a product used by real users
- Hands-on experience with WebRTC, AEC (echo cancellation), noise suppression, and VAD
- Strong DSP fundamentals: adaptive filtering, STFT, resampling, and gain control
- C/C++ or Rust for production DSP, and experience shipping it to the browser via WebAssembly
- Working knowledge of the browser audio stack: Web Audio API, AudioWorklet, and MediaStream constraints
- Comfort with latency, buffering, and sample rates in a streaming audio pipeline
- Owns features end-to-end and works comfortably in a shared production codebase.
Bonus Qualifications- Experience working at a voice, speech, or video-conferencing company
- ML for audio: noise suppression, VAD, or source separation (e.g. RNNoise, DeepFilterNet, Silero VAD), and on-device inference (ONNX Runtime, Core ML)
- Familiarity with WebRTC internals (the Audio Processing Module, AEC3, Opus) and voice-agent frameworks (LiveKit, Pipecat)
- TypeScript and React, and comfort working across the product frontend
- Experience with target-speaker isolation, diarization, or barge-in and turn-detection systems for conversational AI.
CompensationThe US base salary range for this full-time position is between $170,000-$400,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.