Senior Engineer, Inference Control Plane

DigitalOcean

$139K — $174K *
Enterprise Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience in multi-tenant platforms or distributed backend systems
  • Strong operational experience with high-scale distributed services
  • Deep knowledge of SRE principles, including observability and reliability engineering
  • 1+ years hands-on experience with Go / Golang
  • 1+ years experience with Kubernetes
  • Understanding of cloud-native architectures and microservices
  • Proficiency in performance debugging and reliability analysis in production environments
  • Experience tracking metrics like TTFT and GPU utilization

Responsibilities

  • Design and build scalable, multi-tenant services for AI inference
  • Develop and operate high-scale distributed systems focusing on reliability and performance
  • Improve platform resiliency through better observability and automation
  • Collaborate with cross-functional teams to create production-grade systems and APIs
  • Elevate engineering standards through disciplined practices and incident management
  • Contribute to architectural decisions around traffic management and service orchestration
  • Lead efforts in on-call rotations to enhance service health and reduce incidents

Benefits

  • Hybrid work model allows for flexible work arrangements
  • Opportunity to work on cutting-edge AI inference technology
  • Collaborative environment with cross-functional team interactions
  • Potential for professional growth in a rapidly evolving field
  • Focus on innovation and improving operational practices
Full Job Description
We are seeking a Senior Engineer to implement and contribute to the design and optimization of our Serverless Inference infrastructure and APIs. In this role, you will tackle the challenges of large-scale AI workloads, focusing on throughput, GPU utilization, and fault tolerance to support next-generation inference needs of AI native enterprises.
What You'll Do:
  • Design and build scalable, multi-tenant services that power AI inference and intelligent routing workloads.
  • Develop and operate high-scale distributed systems with strong reliability, availability, and performance goals.
  • Strengthen platform resiliency through improved observability, capacity management, automation, and operational tooling.
  • Partner closely with platform, GPU infrastructure, and product engineering teams to deliver production-grade systems and highly available APIs.
  • Raise the engineering bar through strong software design, operational discipline, incident management, and continuous improvement practices.
  • Contribute to architecture decisions around traffic management, service orchestration, reliability, and platform scalability.
  • Participate in on-call rotations and lead efforts to reduce operator pain, improve service health, and prevent recurring incidents.
What You'll Bring:

Required
  • 5+ years of experience building and operating multi-tenant platforms or distributed backend systems
  • Strong experience operating high-scale distributed services in production environments
  • Deep understanding of SRE principles, including observability, incident management, reliability engineering, capacity planning, and operational automation
  • 1+ years of hands-on experience with Go / Golang in production systems
  • 1+ years of experience with Kubernetes
  • Strong understanding of cloud-native architectures, microservices, and distributed systems fundamentals
  • Experience debugging performance, scalability, and reliability issues in production systems
  • Observability Proficiency: Experience tracking infrastructure and inference metrics like Time To First Token (TTFT), Time Per Output Token (TPOT), and GPU utilization.

Bonus
  • AI/ML Framework Knowledge: Understanding of modern LLM serving architectures and familiarity with engines like vLLM or Triton.
  • Experience with API gateways, traffic routing, or service mesh technologies
  • Familiarity with LLM serving stacks such as vLLM, TensorRT-LLM, or similar technologies
  • Experience building systems for inference optimization, rate limiting, routing, or workload orchestration
Compensation Range:
  • $139,000 - $174,000

*This is a hybrid role



#LI-Hybrid

Similar Jobs

More Jobs at DigitalOcean

More Enterprise Technology Jobs

Find similar Senior Engineer, Inference Control Plane jobs: