OpenAI

System Software Engineer, First-Party Hardware

OpenAI$130K — $180K *
Telecommunications & Hardware
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 7+ years of hands-on experience in low-level system software or embedded software.
  • Strong programming skills in C, C++, Rust, or similar systems languages.
  • Experience with Linux-based hardware platforms and firmware.
  • Strong knowledge of hardware/software interfaces like I2C, PCIe, and GPIO.
  • Demonstrated ability to debug live hardware and software systems.
  • Experience with hardware bring-up and deployment of high-performance compute platforms.
  • Ability to reason across software, firmware, hardware, and manufacturing boundaries.

Responsibilities

  • Design, develop, and maintain low-level firmware for AI hardware.
  • Own integration and acceptance of partner software releases.
  • Build and maintain automation for testing systems in lab.
  • Define and debug hardware management protocols across multiple interfaces.
  • Build telemetry and recovery paths for hardware failures.
  • Develop validation and test automation for system readiness.
  • Debug complex production issues across hardware and software.

Benefits

  • Relocation assistance available.
  • Hybrid work model: 3 days onsite in San Francisco.
Full Job Description
About the Role

We're seeking a System Software Engineer to join our First-Party Hardware team. In this role, you will design, build, integrate, and validate low-level system software for the manageability and health of OpenAI's first-party AI hardware systems.

You will work across BMC, Linux, firmware interfaces, automation infra, boot and recovery, hardware diagnostics, telemetry, host and platform drivers, network software interfaces, and manufacturing and fleet readiness. A major part of this role is owning the acceptance path for partner-delivered system software: defining requirements, reviewing code and artifacts, reproducing builds, building tests, pushing fixes, and producing the evidence needed for launch decisions.

This role is hands-on and high-ownership. You will write and review low-level software, debug issues across hardware and software boundaries, build infra and automation to test and manage devices in lab, guide partner deliverables, build validation evidence, and help carry platforms from bring-up through production deployment.

Location: San Francisco, CA (Hybrid: 3 days/week onsite)

Relocation assistance available.
In this role, you will:
  • Design, develop, and maintain low-level firmware and system software for first-party AI hardware manageability, including BMC software, Redfish services, gNMI telemetry, firmware update and recovery flows, BIOS/UEFI interactions, platform drivers, and hardware diagnostics.
  • Own integration and acceptance of partner and vendor software releases, including requirements, code and artifact review, reproducible builds, CI, regression monitoring, version tracking, acceptance criteria, and launch-readiness evidence.
  • Build and maintain automation and CI infra for testing and managing systems in our lab
  • Define and debug hardware management protocols across accelerators, host systems, management controllers, firmware, and platform services, including interfaces such as I2C, SMBus, PMBus, PCIe, Ethernet, GPIO, UART, and JTAG.
  • Build system health monitoring, telemetry, remote diagnostics, and recovery paths that make hardware failures diagnosable in the lab, at manufacturing partners, and in production data centers.
  • Develop validation and test automation for board bring-up, rack bring-up, qualification, manufacturing readiness, deployment readiness, and long-term reliability.
  • Convert engineering releases into manufacturing-ready software recipes: images, versions, logs, limits, remediation mapping, provisioning hooks, secure artifact handling, and traceable data export.
  • Debug complex production issues spanning hardware signals, BMC firmware, BIOS/UEFI, kernel drivers, platform services, network topology, PCIe behavior, power, thermals, boot, provisioning, and manufacturing test.
  • Partner with hardware, firmware, security, networking, infrastructure, manufacturing, operations, and external engineering teams to define software contracts, unblock bring-up, and drive issues to closure.
  • Produce durable architecture notes, runbooks, validation records, and decision documents that help OpenAI and partner teams reproduce, operate, and improve the platform.
You might thrive in this role if you:
  • 7+ years of hands-on experience, or exceptional accomplishments demonstrating equivalent expertise, in low-level system software, embedded software, firmware, BMC software, platform software, device drivers, or hardware diagnostics.
  • Strong programming skills in C, C++, Rust or similar systems languages, with experience building reliable software for real hardware.
  • Experience with Linux-based hardware platforms, embedded Linux, OpenBMC, Redfish, BMCWeb, IPMI boundaries, BIOS/UEFI, bootloaders, firmware update systems, kernel drivers, RTOS, or fleet management software.
  • Strong knowledge of hardware/software interfaces such as I2C, SMBus, PMBus, SPI, PCIe, Ethernet, USB, UART, GPIO, JTAG, power controllers, board-level debug tools, or protocol analyzers.
  • Demonstrated ability to debug live hardware using logs, packet captures, firmware traces, bus captures, lab hosts, BMC journals, Linux tooling, and carefully controlled experiments.
  • Experience with hardware bring-up, manufacturing or qualification testing, system diagnostics, release validation, or deployment of high-performance compute, accelerator, server, networking, storage, or embedded platforms.
  • Ability to reason across software, firmware, hardware, manufacturing, and operations boundaries, and to turn ambiguous problems into clear requirements, designs, tests, and decisions.
  • Proven track record working with external vendors, manufacturing partners, or partner engineering teams to define deliverables, review technical work, and drive issues to closure.
  • Familiarity with platform security topics such as secure boot, firmware signing, device provisioning, attestation, certificate handling, trusted update flows, or access-control design is a plus.

To comply with U.S. export control laws and regulations, candidates for this role may need to meet certain legal status requirements as provided in those laws and regulations.

About OpenAI

OpenAI is an artificial intelligence research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company was founded in 2015 by a group of technology leaders, including Elon Musk, Sam Altman, Greg Brockman, Ilya Sutskever, and John Schulman. OpenAI's mission is to develop and promote friendly AI for the betterment of humanity. The company has developed a number of cutting-edge AI technologies, including GPT-3, a language processing system that can generate human-like text. OpenAI has received funding from a number of high-profile investors, including LinkedIn co-founder Reid Hoffman and venture capitalist Peter Thiel.
Learn more about OpenAI
Size
100 employees
Industry
Founded
2015

Similar Jobs

More Jobs at OpenAI

More Telecommunications & Hardware Jobs

Find similar System Software Engineer, First-Party Hardware jobs: