Senior Engineer, Inference Control Plane

DigitalOcean

• $139K — $174K *

Seattle, WA 98115In-Person

Enterprise Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of experience in multi-tenant platforms or distributed backend systems
Strong operational experience with high-scale distributed services
Deep knowledge of SRE principles, including observability and reliability engineering
1+ years hands-on experience with Go / Golang
1+ years experience with Kubernetes
Understanding of cloud-native architectures and microservices
Proficiency in performance debugging and reliability analysis in production environments
Experience tracking metrics like TTFT and GPU utilization

Responsibilities

Design and build scalable, multi-tenant services for AI inference
Develop and operate high-scale distributed systems focusing on reliability and performance
Improve platform resiliency through better observability and automation
Collaborate with cross-functional teams to create production-grade systems and APIs
Elevate engineering standards through disciplined practices and incident management
Contribute to architectural decisions around traffic management and service orchestration
Lead efforts in on-call rotations to enhance service health and reduce incidents

Benefits

Hybrid work model allows for flexible work arrangements
Opportunity to work on cutting-edge AI inference technology
Collaborative environment with cross-functional team interactions
Potential for professional growth in a rapidly evolving field
Focus on innovation and improving operational practices

Full Job Description

We are seeking a Senior Engineer to implement and contribute to the design and optimization of our Serverless Inference infrastructure and APIs. In this role, you will tackle the challenges of large-scale AI workloads, focusing on throughput, GPU utilization, and fault tolerance to support next-generation inference needs of AI native enterprises.
What You'll Do:

Design and build scalable, multi-tenant services that power AI inference and intelligent routing workloads.
Develop and operate high-scale distributed systems with strong reliability, availability, and performance goals.
Strengthen platform resiliency through improved observability, capacity management, automation, and operational tooling.
Partner closely with platform, GPU infrastructure, and product engineering teams to deliver production-grade systems and highly available APIs.
Raise the engineering bar through strong software design, operational discipline, incident management, and continuous improvement practices.
Contribute to architecture decisions around traffic management, service orchestration, reliability, and platform scalability.
Participate in on-call rotations and lead efforts to reduce operator pain, improve service health, and prevent recurring incidents.

What You'll Bring:

Required

5+ years of experience building and operating multi-tenant platforms or distributed backend systems
Strong experience operating high-scale distributed services in production environments
Deep understanding of SRE principles, including observability, incident management, reliability engineering, capacity planning, and operational automation
1+ years of hands-on experience with Go / Golang in production systems
1+ years of experience with Kubernetes
Strong understanding of cloud-native architectures, microservices, and distributed systems fundamentals
Experience debugging performance, scalability, and reliability issues in production systems
Observability Proficiency: Experience tracking infrastructure and inference metrics like Time To First Token (TTFT), Time Per Output Token (TPOT), and GPU utilization.

Bonus

AI/ML Framework Knowledge: Understanding of modern LLM serving architectures and familiarity with engines like vLLM or Triton.
Experience with API gateways, traffic routing, or service mesh technologies
Familiarity with LLM serving stacks such as vLLM, TensorRT-LLM, or similar technologies
Experience building systems for inference optimization, rate limiting, routing, or workload orchestration

Compensation Range:

$139,000 - $174,000

*This is a hybrid role

#LI-Hybrid

* Ladders Estimates

Similar Jobs

Software Development Engineer, Aurora Control Plane
$143K — $194K *
Amazon
Seattle, WA 98115 (King County)
Reposted Today
Senior Energy Engineer - Outreach
$101K — $152K *
Cascade Energy
Kennewick, WA 99336 (Benton County)
Today
Senior Energy Engineer - Outreach
$101K — $152K *
Cascade Energy
Walla Walla, WA 99362 (Walla Walla County)
Today
Remote Consulting Engineer - Industrial Regulatory
$100K — $145K *
Intertek Group
Remote
Reposted Today
Sr. Engineer - Critical Facilities
$90K — $162K *
T-Mobile
Bellevue, WA 98006 (King County)
Reposted 3 days ago
Energy Market Advisor - Energy Advisory Services (Remote)
$106K — $189K *
WSP
Seattle, WA 98154 (King County)
5 days ago

Get Ready For Your
Next Interview

More Jobs at DigitalOcean

Senior Engineer, Inference Control Plane
$139K — $174K *
Seattle, WA 98115 (King County)
Today
Enterprise Technology
In-Person
Lead Technical Program Manager
$148K — $186K *
Seattle, WA 98115 (King County)
Today
Telecommunications & Hardware
In-Person
Senior Security Engineer I
$140K — $165K *
Seattle, WA 98115 (King County)
Today
Information Technology
In-Person
Senior Security Engineer I
$140K — $165K *
Boston, MA 02115 (Suffolk County)
Today
Information Technology
In-Person
Senior Engineering Manager, Kernel and Virt
$200K — $251K *
Seattle, WA 98115 (King County)
Today
Information Technology
In-Person

More Enterprise Technology Jobs

Presales Technical Consultant
$67K — $97K *
HP Development Company, L.P.
Rio Rancho, NM 87124 (Sandoval County)
Reposted Today
Enterprise Account Executive
$200K — $300K *
LocalStack
Remote
Reposted Today
Lead Member of Technical Staff
$120K — $150K *
Salesforce
Indianapolis, IN 46227 (Marion County)
Today
Staff Software Engineer
$120K — $150K *
Trellix
Frisco, TX 75034 (Denton County)
Today
Senior AI Software Engineer
$121K — $206K *
T Rowe Price Group, Inc
New York, NY 10025 (New York County)
Reposted Today

Find similar Senior Engineer, Inference Control Plane jobs:

Nationwide Seattle, WA

Senior Engineer, Inference Control Plane

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Engineer, Inference Control Plane jobs:

Get Ready For Your
Next Interview