OpenAI

CPU/Storage/PoP-WAN Program Manager

OpenAI$130K — $180K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 8+ years in technical program management or infrastructure deployment
  • Experience with large-scale compute, storage, and networking systems
  • Knowledge of servers, storage arrays, routers, and structured cabling
  • Proven ability to manage cross-functional programs and external vendors
  • Strong understanding of deployment lifecycles from planning to handoff
  • Ability to connect physical execution with systems architecture
  • Exceptional communication skills, especially for leadership updates
  • Ability to thrive in fast-paced environments with shifting priorities

Responsibilities

  • Lead activation programs for CPU/GPU clusters globally
  • Drive readiness for production clusters from contracted capacity
  • Manage deployment programs for PoPs, WAN expansion, and interconnections
  • Create integrated schedules for procurement and installation phases
  • Coordinate hardware readiness across all deployment components
  • Collaborate with engineering on dependencies for cluster activation
  • Oversee testing and performance validation of storage systems
  • Manage risk identification and mitigation throughout deployment processes

Benefits

  • Opportunity to play a key role in cutting-edge AI infrastructure
  • Work in a collaborative and technically rich environment
  • Potential for travel to various sites as needed
  • Engagement with senior leadership and cross-functional teams
  • Contribute to the scalable deployment of crucial technologies
Full Job Description
About the Role

We are seeking a highly technical Program Manager to lead execution across CPU, Storage, PoP, and WAN infrastructure programs that directly unlock OpenAI's next generation compute capacity.

In this role, you will own complex cross-functional programs spanning compute cluster activation, storage deployment, PoP bring-up, and backbone expansion. You will coordinate hardware readiness, site readiness, network pathing, storage availability, vendor execution, and engineering dependencies required to turn contracted infrastructure into live training and inference capacity.

This role requires strong technical fluency across hardware systems, network infrastructure, storage architecture, and deployment execution. You should be comfortable operating from rack-level implementation details through executive-level capacity planning discussions.

This role is based in San Francisco, CA, with travel as needed.

Key Responsibilities
  • Lead end-to-end execution of CPU / GPU cluster activation programs across OpenAI's global infrastructure footprint
  • Drive readiness to convert contracted compute capacity into schedulable production clusters
  • Own deployment programs for new PoPs, backbone nodes, WAN expansion, and interconnection initiatives
  • Build integrated schedules spanning procurement, logistics, installation, storage readiness, network turn-up, testing, and production handoff
  • Coordinate BOM readiness, server delivery, racks, optics, cabling, storage hardware, and vendor milestones
  • Partner with engineering teams to align compute, storage, and networking dependencies before cluster activation
  • Manage deployment of storage systems supporting training and inference workloads, including readiness, validation, performance checks, and scaling plans
  • Coordinate backbone capacity expansion, cross-connects, inter-region pathing, and cloud interconnect readiness with Azure and third-party providers
  • Lead physical deployment execution including rack-and-stack, hardware bring-up, L1 validation, and site acceptance criteria
  • Build repeatable deployment playbooks, dashboards, governance cadences, and operating mechanisms for scale
  • Identify risks early across supply chain, site readiness, technical constraints, and vendor execution, then drive mitigation plans
  • Communicate milestones, escalations, and capacity forecasts to senior leadership

Qualifications
  • 8+ years of experience in technical program management, infrastructure deployment, network deployment, or data center operations
  • Strong experience delivering programs involving compute, storage, networking, or large-scale infrastructure systems
  • Working knowledge of servers, clusters, storage arrays, routers, switches, optics, and structured cabling
  • Experience owning cross-functional programs across engineering, operations, supply chain, and external vendors
  • Strong understanding of deployment lifecycles from planning and procurement through production handoff
  • Ability to reason across physical infrastructure execution and logical systems architecture dependencies
  • Proven ability to build integrated schedules and drive accountability across multiple stakeholders
  • Strong executive communication skills with experience managing critical escalations and leadership updates
  • Comfortable operating in fast-moving environments with aggressive timelines and evolving priorities
  • Highly analytical with strong problem-solving and execution instincts

Preferred Skills
  • Experience at a hyperscaler, cloud provider, AI infrastructure company, or global network operator
  • Experience deploying GPU clusters, HPC systems, or large training environments
  • Familiarity with distributed storage systems and high-performance data infrastructure
  • Experience with PoP deployments, WAN backbone expansion, or global network buildouts
  • Experience working across first-party, colo, and cloud environments
  • Experience building repeatable infrastructure deployment systems in high-growth environments

About OpenAI

OpenAI is an artificial intelligence research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company was founded in 2015 by a group of technology leaders, including Elon Musk, Sam Altman, Greg Brockman, Ilya Sutskever, and John Schulman. OpenAI's mission is to develop and promote friendly AI for the betterment of humanity. The company has developed a number of cutting-edge AI technologies, including GPT-3, a language processing system that can generate human-like text. OpenAI has received funding from a number of high-profile investors, including LinkedIn co-founder Reid Hoffman and venture capitalist Peter Thiel.
Learn more about OpenAI
Size
100 employees
Industry
Founded
2015

Similar Jobs

More Jobs at OpenAI

More Information Technology Jobs

Find similar CPU/Storage/PoP-WAN Program Manager jobs: