What you'll doWe're seeking a Cluster Infrastructure Engineer to join our founding team who will own the GPU compute fabric that trains our foundation models - optimizing the machines we have today, automating how we manage them, and laying the groundwork to scale as we grow.
- Manage and automate our GPU training clusters, including provisioning, bootstrapping, and lifecycle management.
- Automate bare-metal bring-up so new machines come online quickly and reliably as we add capacity.
- Build software abstractions that present a clean, unified interface to our training and simulation workloads.
- Work at the hardware/software boundary, where speed and reliability are critical, continuously raising the bar for automation and uptime.
- Run day-to-day operations: diagnose and resolve issues quickly when systems are under pressure.
- Design our infrastructure to scale smoothly as we grow from a smaller cluster of machines toward a larger fleet.
What we're looking for- 6+ years experience operating GPU compute on Kubernetes (or similar orchestration), with the judgment to scale it as demand grows.
- Strong programming and scripting skills in Python, Go, or similar.
- Familiarity with Infrastructure-as-Code tools such as Terraform or CloudFormation.
- Comfort with bare-metal Linux environments, GPU hardware, and networking.
- A bias toward automation, reliability, and operating critical systems well.
Why join usAt Atoms, you'll work on one of the defining challenges of our time - bringing automation into the physical world to drive real, lasting impact. We exist to uncover valuable unknown truths and turn them into progress, which means constantly pushing beyond what's known and building what doesn't yet exist. The work is ambitious and often challenging, but it's grounded in a shared sense of purpose and a team committed to seeing it through together. Our work only matters if it serves others, and we know that meaningful progress depends on the trust of the people we serve and the strength of our team - so we invest in both, creating an environment where you can do your best work and grow.
What else you need to knowThis role is based in our San Francisco office. Atoms is a company driven by invention and continuous change - we are constantly reimagining our industries, building new products, and refining how we operate. We do our best work together. That's why all of our office-based teams work onsite, five days a week.
The base salary range for this role is
$224,000 - $284,000 per year.Actual compensation will be determined on an individual basis and may vary depending on experience, skills, and qualifications.
Base salary is just one part of your total rewards package. You may also be eligible for equity awards and an annual performance-based bonus.
Benefits Summary (USA Full-Time Exempt Employees):- Medical, Dental, Vision, Disability, and Life Insurance
- Flexible Spending Account / Health Savings Account Options
- 401(k)
- Equity
- Sick Time, Unlimited Flexible Time Off, and Paid Holidays
- Paid Parental Leave
- Pre-Tax Commuter Benefit Plan
- Team lunch in our SoMa office every Tuesday and Thursday
Benefits are subject to change at the company's discretion.
Atoms accepts applications on an ongoing basis.
Ready to join us as we serve those who serve others? #LI-Onsite