Role OverviewWe're hiring a Software Engineer to own the serving infrastructure that connects Rime's inference engines to the world. This role sits at the intersection of ML systems and cloud infrastructure - you'll work directly on model inference and cloud infrastructure to build, harden, and scale the systems that stream voice at real-time latency. As Rime moves toward its next-generation architecture, you'll be a core architect of how our models get served.
What You'll Own- Architecture and implementation of Rime's TTS serving infrastructure, from GPU-backed inference engines to the API surface.
- Model optimization from a single-node to disaggregated fleet serving.
- Compatibility with different NVIDIA hardwares from Hopper to Blackwell and beyond for on-prem and cloud deployments.
- Continuous integration and deployment workflows for the model serving pipeline.
- Site reliability: on-call rotation, monitoring, alerting, and observability across the serving stack.
- Resource provision, cost management across our GPU fleet.
What We're Looking For- Hands-on experience with real-time multinode ML serving infrastructure - ML serving framework experience: NVIDIA Dynamo/Triton, vLLM, SGLang, or equivalent.
- Experience with distributed or disaggregated model serving (Tensor Parallel, Pipeline Parallel, or equivalent).
- Strong cloud infrastructure fundamentals: Linux internals, networking, containerization (Docker, Kubernetes).
- IaC experience - Terraform, Packer, or comparable. You should have opinions about how to do this right.
- On-call is part of the job. You treat production reliability as a shared responsibility.
Nice to Have- Experience with multinode training (DDP, FSDP, etc.).
- Experience with gRPC or other bidirectional binary streaming protocols.
- Experience with audio streaming and related technologies (WebRTC, WebSockets, etc.).
- Experience with a multilingual monorepo where you pick the best language out of merit more than personal experience.
- Experience with multi-cloud infrastructures (AWS, GCP, OCI, etc.).
- Comfort with configuration management tooling (Ansible, Chef, Puppet, or similar).
- SRE, DevOps, or platform engineering background at a startup.
- Experience at an early-stage company.
Why Join Rime- Build the serving infrastructure behind a category-defining voice AI company from the ground up.
- You will bring in experience that no one else currently has at the company: you can help us set the vision.
- Direct collaboration with the inference, platform, and ML teams - no handoff culture.
- The systems you build determine what experiences our customers can deploy at scale.
- Meaningful equity upside at an early stage.
- High ownership, high standards, low bureaucracy.
- SF / Bay Area.