Gem.com

Head of Global Compute Supply & Platform Strategy

Gem.com$250K — $450K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 10+ years of engineering leadership in large-scale distributed systems or infrastructure.
  • Proven track record in compute platform strategy at a frontier AI lab, hyperscaler, or major autonomy program.
  • Deep technical understanding of high-performance cluster topology and large-scale data systems.
  • Operational experience with 10k+ accelerator environments in high-performance production settings.
  • Experience with capital orchestration at >100B-parameter scale or >100k-GPU-day scale (preferred).
  • Familiarity with low-latency demands of edge-to-cloud inference (preferred).

Responsibilities

  • Architect and lead a multi-year compute strategy including capacity planning and vendor partnerships.
  • Provide strategic leadership to the infrastructure, distributed systems, and datacenter operations teams.
  • Oversee cluster architectural efficiency to achieve over 50% Model Flops Utilization on training runs.
  • Negotiate and manage large-scale capital deployments for compute infrastructure, collaborating with Finance.
  • Champion a platform strategy for world-model training, simulations, and real-time inference using a single, elastic fleet.
  • Act as the primary interface with commercial partners like NVIDIA and AMD.

Benefits

  • Comprehensive health, dental, and vision insurance.
  • Flexible working hours and remote working options.
  • Opportunity for professional development and training.
  • Work in a cutting-edge robotics and AI environment.
  • Join a diverse and collaborative executive team.
Full Job Description
The Role

Compute is the ultimate physical and financial prerequisite for the robotics foundation models we are building. This role owns Luma's global compute footprint end-to-end-bridging macro capacity strategy, multi-million dollar capital allocation, and top-tier systems architecture. You will design our scaling roadmap from the silicon up, ensuring our research and robotics teams have the uninterrupted runway they need to ship frontier world models. As a member of the executive team, you will be the single person responsible for turning capital into capability.

What You'll Do

  • Architect Multi-Year Compute Strategy: Lead capacity planning, global vendor and cloud partnerships, on-prem vs. cloud mix, and accelerator supply chain roadmaps (H/B-series GPUs, custom silicon evaluation).
  • Direct the Platform Org: Provide strategic leadership to our infrastructure, distributed systems, and datacenter operations teams-scaling the organization to support next-generation compute demands.
  • Maximize Fleet Utilization: Oversee the architectural efficiency of our cluster configurations to deliver >50% Model Flops Utilization (MFU) on flagship training runs.
  • Command a Megawatt Budget: Negotiate, secure, and operate our largest-scale capital deployments for compute infrastructure, partnering directly with Finance to optimize unit economics and risk management.
  • Unify Global Capacity: Champion the platform strategy that enables world-model training, heavy simulation rollouts, and real-time on-robot inference to seamlessly share a single, elastic fleet.
  • Act as Principal Executive Interface: Serve as the primary commercial and strategic bridge to NVIDIA, AMD, hyperscalers, and frontier silicon vendors.


Qualifications:

  • 10+ years of engineering leadership experience in large-scale distributed systems, infrastructure, or technical supply chain, with a proven track record of leading compute platform strategy at a frontier AI lab, hyperscaler, or major autonomy program.
  • Deep technical & commercial fluency in high-performance cluster topology, high-speed interconnects (InfiniBand/RoCE), large-scale data systems, and the economics of distributed training architectures.
  • Direct operational oversight of 10k+ accelerator environments in high-performance production settings.


Preferred qualifications:

  • Scale Credentials: Experience orchestrating capital or infrastructure for training runs at the >100B-parameter or >100k-GPU-day scale.
  • Robotics/Autonomy Context: Familiarity with the unique capacity and latency demands of edge-to-cloud inference and real-time autonomous systems.


Compensation

The base pay range for this role is $250,000 - $450,000 per year.

About Gem.com

Industry
Founded
2013

Similar Jobs

More Jobs at Gem.com

More Information Technology Jobs

Find similar Head of Global Compute Supply & Platform Strategy jobs: