CloudKitchens

Staff Cluster Infrastructure Engineer

CloudKitchens$224K — $284K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 6+ years operating GPU compute on Kubernetes or similar orchestration
  • Strong programming and scripting skills in Python, Go, or comparable languages
  • Familiarity with Infrastructure-as-Code tools (Terraform, CloudFormation)
  • Comfortable with bare-metal Linux environments and GPU hardware
  • Bias toward automation and reliability in critical systems

Responsibilities

  • Manage GPU training clusters and automate their lifecycle
  • Automate bare-metal infrastructure to ensure quick and reliable machine onboarding
  • Build software abstractions for a unified interface to workloads
  • Enhance automation and uptime at the hardware/software interface
  • Diagnose and resolve operational issues swiftly
  • Design infrastructure to scale with company growth

Benefits

  • Medical, Dental, Vision, Disability, and Life Insurance
  • Flexible Spending Account / Health Savings Account options
  • 401(k) plan
  • Equity awards
  • Unlimited Flexible Time Off and Paid Holidays
  • Paid Parental Leave
  • Pre-Tax Commuter Benefit Plan
  • Team lunches in the SoMa office twice a week
Full Job Description
What you'll doWe're seeking a Cluster Infrastructure Engineer to join our founding team who will own the GPU compute fabric that trains our foundation models - optimizing the machines we have today, automating how we manage them, and laying the groundwork to scale as we grow.

  • Manage and automate our GPU training clusters, including provisioning, bootstrapping, and lifecycle management.
  • Automate bare-metal bring-up so new machines come online quickly and reliably as we add capacity.
  • Build software abstractions that present a clean, unified interface to our training and simulation workloads.
  • Work at the hardware/software boundary, where speed and reliability are critical, continuously raising the bar for automation and uptime.
  • Run day-to-day operations: diagnose and resolve issues quickly when systems are under pressure.
  • Design our infrastructure to scale smoothly as we grow from a smaller cluster of machines toward a larger fleet.

What we're looking for
  • 6+ years experience operating GPU compute on Kubernetes (or similar orchestration), with the judgment to scale it as demand grows.
  • Strong programming and scripting skills in Python, Go, or similar.
  • Familiarity with Infrastructure-as-Code tools such as Terraform or CloudFormation.
  • Comfort with bare-metal Linux environments, GPU hardware, and networking.
  • A bias toward automation, reliability, and operating critical systems well.

Why join us

At Atoms, you'll work on one of the defining challenges of our time - bringing automation into the physical world to drive real, lasting impact. We exist to uncover valuable unknown truths and turn them into progress, which means constantly pushing beyond what's known and building what doesn't yet exist. The work is ambitious and often challenging, but it's grounded in a shared sense of purpose and a team committed to seeing it through together. Our work only matters if it serves others, and we know that meaningful progress depends on the trust of the people we serve and the strength of our team - so we invest in both, creating an environment where you can do your best work and grow.

What else you need to know

This role is based in our San Francisco office. Atoms is a company driven by invention and continuous change - we are constantly reimagining our industries, building new products, and refining how we operate. We do our best work together. That's why all of our office-based teams work onsite, five days a week.

The base salary range for this role is $224,000 - $284,000 per year.

Actual compensation will be determined on an individual basis and may vary depending on experience, skills, and qualifications.

Base salary is just one part of your total rewards package. You may also be eligible for equity awards and an annual performance-based bonus.

Benefits Summary (USA Full-Time Exempt Employees):
  • Medical, Dental, Vision, Disability, and Life Insurance
  • Flexible Spending Account / Health Savings Account Options
  • 401(k)
  • Equity
  • Sick Time, Unlimited Flexible Time Off, and Paid Holidays
  • Paid Parental Leave
  • Pre-Tax Commuter Benefit Plan
  • Team lunch in our SoMa office every Tuesday and Thursday

Benefits are subject to change at the company's discretion.
Atoms accepts applications on an ongoing basis.

Ready to join us as we serve those who serve others?

#LI-Onsite

About CloudKitchens

CloudKitchens is a technology company that provides a platform for restaurants to operate delivery-only kitchens. The company's platform allows restaurants to expand their delivery reach without the need for additional physical locations, while also providing real-time data and analytics to optimize operations. CloudKitchens was founded in 2016 by Travis Kalanick, the co-founder of Uber, and is headquartered in Los Angeles, California.
Learn more about CloudKitchens
Size
1,000 employees
Industry
Founded
2016

Similar Jobs

More Jobs at CloudKitchens

More Information Technology Jobs

Find similar Staff Cluster Infrastructure Engineer jobs: