We are seeking a Senior Engineer to implement and contribute to the design and optimization of our Serverless Inference infrastructure and APIs. In this role, you will tackle the challenges of large-scale AI workloads, focusing on throughput, GPU utilization, and fault tolerance to support next-generation inference needs of AI native enterprises.
What You'll Do:- Design and build scalable, multi-tenant services that power AI inference and intelligent routing workloads.
- Develop and operate high-scale distributed systems with strong reliability, availability, and performance goals.
- Strengthen platform resiliency through improved observability, capacity management, automation, and operational tooling.
- Partner closely with platform, GPU infrastructure, and product engineering teams to deliver production-grade systems and highly available APIs.
- Raise the engineering bar through strong software design, operational discipline, incident management, and continuous improvement practices.
- Contribute to architecture decisions around traffic management, service orchestration, reliability, and platform scalability.
- Participate in on-call rotations and lead efforts to reduce operator pain, improve service health, and prevent recurring incidents.
What You'll Bring:Required - 5+ years of experience building and operating multi-tenant platforms or distributed backend systems
- Strong experience operating high-scale distributed services in production environments
- Deep understanding of SRE principles, including observability, incident management, reliability engineering, capacity planning, and operational automation
- 1+ years of hands-on experience with Go / Golang in production systems
- 1+ years of experience with Kubernetes
- Strong understanding of cloud-native architectures, microservices, and distributed systems fundamentals
- Experience debugging performance, scalability, and reliability issues in production systems
- Observability Proficiency: Experience tracking infrastructure and inference metrics like Time To First Token (TTFT), Time Per Output Token (TPOT), and GPU utilization.
Bonus - AI/ML Framework Knowledge: Understanding of modern LLM serving architectures and familiarity with engines like vLLM or Triton.
- Experience with API gateways, traffic routing, or service mesh technologies
- Familiarity with LLM serving stacks such as vLLM, TensorRT-LLM, or similar technologies
- Experience building systems for inference optimization, rate limiting, routing, or workload orchestration
Compensation Range: *This is a hybrid role
#LI-Hybrid