Systems Engineer: Real-Time Engine

Nuance Labs

$120K — $150K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in real-time streaming systems design.
  • Proficiency in Python and async programming; familiarity with asyncio is essential.
  • Strong background in systems programming, ideally with experience in Rust, C++, or Go.
  • Expertise in concurrency and state machine design, especially in async environments.
  • Intuitive understanding of latency management across systems, including CPU and GPU interactions.
  • Ability to thrive in a fast-paced environment requiring rapid design and implementation.

Responsibilities

  • Build and manage the server-side real-time engine, ensuring smooth session lifecycle and state management.
  • Integrate GPU-backed model inference into the real-time interaction loop.
  • Create performance tools for latency analysis and system profiling.
  • Collaborate across teams to define system behavior within APIs and underlying architecture.
  • Develop and uphold coherent timing and scheduling layers in the interaction loop.

Benefits

  • Opportunity to influence system architecture as an early engineer in a small team.
  • Access to work alongside a well-funded startup backed by notable investors.
  • Collaboration in a dynamic, in-person environment based in Seattle.
Full Job Description
About the Role

We're building the engine that powers our AI avatar: a real-time interactive loop that continuously senses the user (audio and video), orchestrates inference across multiple models, manages state, and renders a coherent audio-visual response within tight latency budgets.

Traditional real-time systems are hard because the timing requirements are strict. This system is harder: the system components are neural networks with variable latency, non-deterministic outputs, and no ability to pause the user while they think. You're building a system that has to feel instantaneous while running inference that isn't. This is the runtime that makes a human-AI conversation feel alive,.

You'll own this runtime and collaborate closely with our research team on how models are invoked, how conversational context is assembled, and how response quality is balanced against latency. You'll have direct influence over architecture decisions as an early engineer at a small, well-funded team.

What You'll Do
  • Build and own the server-side real-time engine: session lifecycle, state management, and the architecture of the interaction loop, including the timing and scheduling layer that keeps the loop coherent
  • Integrate GPU-backed model inference into the real-time loop, wiring model outputs into the engine's state and render pipeline
  • Develop performance tooling for latency breakdowns (TTFO, steady-state), tracing, profiling, and regression detection
  • Collaborate with product and research to define how the system behaves at its boundaries - APIs, event streams, and the invariants the engine guarantees to the rest of the stack

Required Skills
  • Real-time streaming systems experience. You've built systems that operate on a continuous real-time loop with hard per-tick latency budgets, where output must never stall.
  • Strong Python and async programming. You need to be productive immediately in Python - asyncio should be second nature. The key skill is writing prototype code with clean enough architecture that it survives a language port.
  • Systems programming background. The production system will be written in Rust. You don't need to know Rust today, but you should have experience in at least one systems language (Rust, C++, Go) and be motivated to adopt Rust.
  • Concurrency and state machine design. Experience designing concurrent systems: async runtimes, thread models, lock contention, schedulers. Specifically, managing multiple in-flight async processes with cancellation, priority switching, and preemption
  • Strong intuition for latency. Profiling, tail behavior, and tradeoffs across throughput vs. responsiveness. Ability to reason about end-to-end pipelines across CPU and GPU boundaries.
  • Comfort building from scratch under time pressure. This is a "design the architecture and ship it" role, not a "maintain existing infrastructure" role. You're comfortable with ambiguity and rapid iteration.

Bonus Points
  • Experience with real-time media systems: WebRTC, RTP/RTCP, jitter buffers, A/V sync
  • Experience with real-time tick-loop architectures (e.g., game engines, simulation runtimes, audio DSP pipelines, robotics)
  • Experience with GPU inference serving and optimization: Triton, TensorRT, vLLM, CUDA profiling
  • Building LLM agent orchestration systems
  • Familiarity with streaming generation systems: incremental decoding and mid-stream control, lock-free data structure design


  • $10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels including Synthesia's former CPO.


  • In-person collaboration, 5 days a week at Seattle HQ

Similar Jobs

More Jobs at Nuance Labs

More Information Technology Jobs

Find similar Systems Engineer: Real-Time Engine jobs: