The roleAt Applied Compute, our applied researchers work directly with enterprises to design, deploy, and continuously improve AI agents that solve real operational problems. As an AI Platform Engineer, you'll build the infrastructure that makes this possible.
You'll own the foundational systems that power Applied Compute's post-training and agent infrastructure: large-scale evaluation pipelines, model serving systems, training orchestration, secure execution environments, and the deployment platform that brings continuously improving AI systems into customer environments. Your work will enable researchers to rapidly build, evaluate, and deploy production AI systems while meeting the security, reliability, and compliance requirements of large enterprises.
What you'll do- Build orchestration systems for post-training, evaluation, data generation, and continuous improvement workflows
- Build large-scale evaluation infrastructure that measures model and agent performance across customer deployments and research workflows
- Design and operate model serving systems that deliver low-latency, reliable inference for production AI applications
- Architect the data infrastructure that powers training, evaluation, observability, and model improvement across customer environments
- Develop secure execution environments for agents, evaluations, and training workloads using microVMs, containers, and modern sandboxing technologies
- Design authentication, authorization, audit logging, and security controls that enable AI systems to operate safely within enterprise environments
- Build deployment and provisioning systems that allow continuously improving models and agents to run inside customer VPCs and cloud environments
- Improve reliability, scalability, observability, and operational efficiency across serving, evaluation, and training infrastructure
- Partner closely with applied researchers to build the infrastructure that turns production data into better models, evaluations, and AI systems
What we're looking for- 5+ years of experience building distributed systems, infrastructure platforms, ML infrastructure, or large-scale backend services
- Strong systems engineering fundamentals, including distributed systems, networking, operating systems, and cloud infrastructure
- Experience designing and operating production systems with high reliability, scalability, and availability requirements
- Experience building or operating orchestration systems, data pipelines, model serving infrastructure, or other large-scale platform services
- Familiarity with containers, Kubernetes, infrastructure-as-code, and modern deployment workflows
- Strong understanding of security fundamentals, including isolation, identity, secrets management, and auditing
- Ability to reason about performance, scalability, fault tolerance, and operational tradeoffs in complex distributed systems
- Excitement about partnering closely with applied researchers to build infrastructure for evaluation, post-training, and production AI systems
Strong candidates also have- Experience with sandboxing or isolation technologies such as Firecracker, gVisor, or Kata Containers
- Experience with workflow orchestration systems such as Temporal, or similar platforms
- Experience building platforms deployed into customer-controlled cloud environments
- Experience with ML infrastructure, including model serving, distributed training, evaluation systems, or GPU scheduling
- Experience building developer platforms, internal tooling, or systems that accelerate the productivity of technical teams
Benefits & LogisticsThis role is based in San Francisco. We work from our office in the Mission. We offer:
- Competitive compensation and equity
- Generous health benefits
- Unlimited PTO
- Paid parental leave
- Daily lunches and dinners
- Transportation and relocation support
- Retirement plans
We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the process with you. We encourage you to apply even if you do not believe you meet every single qualification.