Software Engineer, ML Serving

Rime Labs

$130K — $180K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of hands-on experience with real-time multinode ML serving infrastructure.
  • Proficient in ML serving frameworks like NVIDIA Dynamo/Triton or equivalent.
  • Strong understanding of distributed model serving techniques.
  • Solid background in cloud infrastructure including Linux, Docker, and Kubernetes.
  • Experience with IaC tools like Terraform or Packer.

Responsibilities

  • Architect and implement TTS serving infrastructure supporting GPU-backed inference.
  • Optimize models for both single-node and distributed fleet serving.
  • Ensure compatibility with various NVIDIA hardware for cloud and on-prem solutions.
  • Develop CI/CD workflows for the model serving pipeline.
  • Maintain site reliability through monitoring and observability practices.
  • Manage resources and costs across the GPU fleet.

Benefits

  • Opportunity to build infrastructure for a leading voice AI company.
  • Collaborate directly with teams without a handoff culture.
  • Significant impact on customer experience through the systems you design.
  • Ownership and influence over company direction and technology vision.
  • Equity options offered at an early-stage company.
  • Flexible work environment with minimal bureaucracy.
  • Located in the SF / Bay Area, a hub for tech innovation.
Full Job Description
Role Overview

We're hiring a Software Engineer to own the serving infrastructure that connects Rime's inference engines to the world. This role sits at the intersection of ML systems and cloud infrastructure - you'll work directly on model inference and cloud infrastructure to build, harden, and scale the systems that stream voice at real-time latency. As Rime moves toward its next-generation architecture, you'll be a core architect of how our models get served.

What You'll Own
  • Architecture and implementation of Rime's TTS serving infrastructure, from GPU-backed inference engines to the API surface.
  • Model optimization from a single-node to disaggregated fleet serving.
  • Compatibility with different NVIDIA hardwares from Hopper to Blackwell and beyond for on-prem and cloud deployments.
  • Continuous integration and deployment workflows for the model serving pipeline.
  • Site reliability: on-call rotation, monitoring, alerting, and observability across the serving stack.
  • Resource provision, cost management across our GPU fleet.


What We're Looking For
  • Hands-on experience with real-time multinode ML serving infrastructure - ML serving framework experience: NVIDIA Dynamo/Triton, vLLM, SGLang, or equivalent.
  • Experience with distributed or disaggregated model serving (Tensor Parallel, Pipeline Parallel, or equivalent).
  • Strong cloud infrastructure fundamentals: Linux internals, networking, containerization (Docker, Kubernetes).
  • IaC experience - Terraform, Packer, or comparable. You should have opinions about how to do this right.
  • On-call is part of the job. You treat production reliability as a shared responsibility.


Nice to Have
  • Experience with multinode training (DDP, FSDP, etc.).
  • Experience with gRPC or other bidirectional binary streaming protocols.
  • Experience with audio streaming and related technologies (WebRTC, WebSockets, etc.).
  • Experience with a multilingual monorepo where you pick the best language out of merit more than personal experience.
  • Experience with multi-cloud infrastructures (AWS, GCP, OCI, etc.).
  • Comfort with configuration management tooling (Ansible, Chef, Puppet, or similar).
  • SRE, DevOps, or platform engineering background at a startup.
  • Experience at an early-stage company.


Why Join Rime
  • Build the serving infrastructure behind a category-defining voice AI company from the ground up.
  • You will bring in experience that no one else currently has at the company: you can help us set the vision.
  • Direct collaboration with the inference, platform, and ML teams - no handoff culture.
  • The systems you build determine what experiences our customers can deploy at scale.
  • Meaningful equity upside at an early stage.
  • High ownership, high standards, low bureaucracy.
  • SF / Bay Area.

Similar Jobs

More Jobs at Rime Labs

  • Technical Project Manager
    $120K — $150K *
    San Francisco, CA 94112 (San Francisco County)
    Information Technology
    In-Person
  • Software Engineer, ML Serving
    $130K — $180K *
    San Francisco, CA 94112 (San Francisco County)
    Information Technology
    In-Person
  • Forward Deployed Linguist
    $90K — $130K *
    San Francisco, CA 94112 (San Francisco County)
    Consumer Technology
    In-Person
  • Fullstack Platform Engineer
    $130K — $180K *
    San Francisco, CA 94112 (San Francisco County)
    Technical Services
    In-Person
  • VP of Engineering
    $180K — $220K *
    San Francisco, CA 94112 (San Francisco County)
    Enterprise Technology
    In-Person

More Information Technology Jobs

Find similar Software Engineer, ML Serving jobs: