The RoleWe are looking for an Infrastructure Engineer to own the compute and cloud foundation that frontier AI research runs on. The systems you build determine how fast we can train models, how reliably experiments run, and how efficiently we scale. Some example areas you might work on:
- Sandboxing and secure execution - design isolated environments where agents and untrusted code can run, use tools, and reach external services
- Kubernetes and multi-cluster compute - operate CPU and GPU clusters as one platform with scheduling, autoscaling, and multi-tenant isolation
- Training and inference infrastructure - understand the resource and scheduling demands of research, training, and inference workloads and build the platform capabilities those workloads need
- Infrastructure for long-running agents - build the state management system to handle checkpointing, recovery, and resumption across failures
- Networking - build the networking layer across clouds, clusters, and hosts: routing, peering, load balancing, and network isolation
If you're excited about building the infrastructure backbone of a frontier AI research lab - where your systems directly determine research velocity - we'd love to hear from you.
We offer a base salary of $350,000-$500,000 and a meaningful equity grant, depending on experience and background, along with competitive benefits.