AI Platform Engineer

Applied Compute

• $130K — $180K *

San Francisco, CA 94112In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of experience in distributed systems, infrastructure platforms, or backend services
Strong systems engineering fundamentals including networking and cloud infrastructure
Experience with high reliability and availability production systems
Familiarity with containers, Kubernetes, and infrastructure-as-code
Understanding of security fundamentals like identity and secrets management
Ability to assess performance and fault tolerance in distributed systems
Excitement for collaborating with applied researchers on AI infrastructure

Responsibilities

Build orchestration systems for continuous improvement workflows
Develop large-scale evaluation infrastructure for measuring model performance
Design model serving systems for reliable AI application inference
Architect data infrastructure for training and model improvement
Create secure execution environments for agent workloads
Implement security controls for safe operation within enterprise environments
Build deployment systems for continuously improving AI models in customer environments

Benefits

Competitive compensation and equity
Generous health benefits
Unlimited PTO
Paid parental leave
Daily lunches and dinners
Transportation and relocation support
Retirement plans

Full Job Description

The role

At Applied Compute, our applied researchers work directly with enterprises to design, deploy, and continuously improve AI agents that solve real operational problems. As an AI Platform Engineer, you'll build the infrastructure that makes this possible.

You'll own the foundational systems that power Applied Compute's post-training and agent infrastructure: large-scale evaluation pipelines, model serving systems, training orchestration, secure execution environments, and the deployment platform that brings continuously improving AI systems into customer environments. Your work will enable researchers to rapidly build, evaluate, and deploy production AI systems while meeting the security, reliability, and compliance requirements of large enterprises.

What you'll do

Build orchestration systems for post-training, evaluation, data generation, and continuous improvement workflows
Build large-scale evaluation infrastructure that measures model and agent performance across customer deployments and research workflows
Design and operate model serving systems that deliver low-latency, reliable inference for production AI applications
Architect the data infrastructure that powers training, evaluation, observability, and model improvement across customer environments
Develop secure execution environments for agents, evaluations, and training workloads using microVMs, containers, and modern sandboxing technologies
Design authentication, authorization, audit logging, and security controls that enable AI systems to operate safely within enterprise environments
Build deployment and provisioning systems that allow continuously improving models and agents to run inside customer VPCs and cloud environments
Improve reliability, scalability, observability, and operational efficiency across serving, evaluation, and training infrastructure
Partner closely with applied researchers to build the infrastructure that turns production data into better models, evaluations, and AI systems

What we're looking for

5+ years of experience building distributed systems, infrastructure platforms, ML infrastructure, or large-scale backend services
Strong systems engineering fundamentals, including distributed systems, networking, operating systems, and cloud infrastructure
Experience designing and operating production systems with high reliability, scalability, and availability requirements
Experience building or operating orchestration systems, data pipelines, model serving infrastructure, or other large-scale platform services
Familiarity with containers, Kubernetes, infrastructure-as-code, and modern deployment workflows
Strong understanding of security fundamentals, including isolation, identity, secrets management, and auditing
Ability to reason about performance, scalability, fault tolerance, and operational tradeoffs in complex distributed systems
Excitement about partnering closely with applied researchers to build infrastructure for evaluation, post-training, and production AI systems

Strong candidates also have

Experience with sandboxing or isolation technologies such as Firecracker, gVisor, or Kata Containers
Experience with workflow orchestration systems such as Temporal, or similar platforms
Experience building platforms deployed into customer-controlled cloud environments
Experience with ML infrastructure, including model serving, distributed training, evaluation systems, or GPU scheduling
Experience building developer platforms, internal tooling, or systems that accelerate the productivity of technical teams

Benefits & Logistics

This role is based in San Francisco. We work from our office in the Mission. We offer:

Competitive compensation and equity
Generous health benefits
Unlimited PTO
Paid parental leave
Daily lunches and dinners
Transportation and relocation support
Retirement plans

We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the process with you. We encourage you to apply even if you do not believe you meet every single qualification.

* Ladders Estimates

Similar Jobs

Principal - Architecture (AI, Unified Commerce)
$150K — $200K *
Gap, Inc.
San Francisco, CA 94112 (San Francisco County)
2 days ago
Senior Technical Solutions Consultant, Agent Assist, Applied AI
$152K — $222K *
Google
Sunnyvale, CA 94087 (Santa Clara County)
2 days ago
AI Solution Architect | Full-Time
$130K — $180K *
EPAM Systems
Remote
2 days ago
AI Solution Architect | Full-Time
$130K — $180K *
EPAM Systems
Remote
2 days ago
Ontology Knowledge Architect
$166K — $350K *
Lam Research
Fremont, CA 94536 (Alameda County)
2 days ago
Staff AI Architect, Remote
$150K — $180K *
Experian
Remote
2 days ago

Get Ready For Your
Next Interview

More Jobs at Applied Compute

Research Systems Engineer
$130K — $180K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Applied Research Engineer
$120K — $160K *
San Francisco, CA 94112 (San Francisco County)
Today
Enterprise Technology
In-Person
Head of Compute
$150K — $200K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Platform Research Engineer
$130K — $180K *
San Francisco, CA 94112 (San Francisco County)
Today
Enterprise Technology
In-Person
Founding Strategic Account Executive
$120K — $180K *
San Francisco, CA 94112 (San Francisco County)
Today
Enterprise Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
5 days ago
Member of Technical Staff, TPU & AMD GPU Performance Engineering
$200K — $400K *
Inferact
San Francisco, CA 94112 (San Francisco County)
Today
Security & Infrastructure Engineer
$120K — $150K *
PointOne Technologies, Inc
New York, NY 10025 (New York County)
Today
Security Engineer
$140K — $190K *
Method Security
New York, NY 10025 (New York County)
Today
Sr. Data Engineer
$113K — $203K *
CVS Health
Chicago, IL 60629 (Cook County)
Today

Find similar AI Platform Engineer jobs:

Nationwide San Francisco, CA

AI Platform Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar AI Platform Engineer jobs:

Get Ready For Your
Next Interview