OpenAI

Networking Operating System Firmware Engineer

OpenAI$130K — $180K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Proven experience with SONiC or comparable NOS stacks like FBOSS, Cumulus Linux, or Arista EOS.
  • Strong software engineering skills including testing, observability, and maintainable coding in languages such as C/C++, Python, Go, or Rust.
  • Experience with Linux kernel internals and network device drivers.
  • Experience with switch ASIC SDKs and SAI implementations from vendors like Broadcom, Marvell, or NVIDIA.
  • Understanding of networking protocols like L2/L3 forwarding, BGP, and telemetry.
  • Experience in platform bring-up and board-level debugging across various hardware flows like thermal and power.
  • Familiarity with CI/CD pipelines and large-scale automation.

Responsibilities

  • Design, develop, and maintain custom NOS images for AI fabrics using open-source components.
  • Integrate and build Linux kernel components and device drivers for switch platforms.
  • Bring up new switch platforms focusing on hardware management and board-specific logic.
  • Extend and customize NOS services for routing and telemetry functionalities.
  • Implement and debug programming flows from control-plane intent to hardware state.
  • Build mechanisms for control-plane and hardware programming verification.
  • Collaborate with hardware teams to validate ASIC configurations and performance baselines.
  • Integrate monitoring and diagnostics into fleet-wide automated workflows.

Benefits

  • Hybrid work model with 3 days in the office.
  • Relocation assistance for new employees.
Full Job Description
Role summary

We are seeking a Networking Operating System Firmware Engineer to help bootstrap and scale the switching layer of our AI supercomputers. In this role, you will build and maintain custom NOS images from scratch, using open source components from SONiC, SAI, FRR, and related networking stacks while working across the Linux kernel, switch ASIC SAI/SDKs, platform drivers, control-plane services, and orchestration layers.

This is a software engineering role that requires a deep understanding of networking, NOS internals, switch hardware, and production systems. You will design, implement, test, and debug production NOS software across platform drivers, routing and control-plane state, ASIC programming, observability, and fleet integration. The engineer in this role should be able to work through ambiguous, open-ended technical problems and drive feature development across software, hardware, and vendor boundaries.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.
In this role, you will
  • Design, develop, and maintain custom NOS images for large-scale AI fabrics, using open source components from SONiC, FRR, and related networking stacks.
  • Integrate, build and configure Linux kernel components, device drivers, switch ASIC SDKs, and SAI layers.
  • Bring up new switch platforms, including thermal and fan control, power monitoring, transceiver management, watchdogs, OSFP CMIS, LEDs, CPLDs, and board-specific platform logic.
  • Extend and customize NOS services for routing, telemetry, control-plane state, and distributed automation.
  • Implement and debug route, neighbor, next-hop, and ECMP programming flows from control-plane intent through ASIC hardware state.
  • Build software mechanisms that distinguish control-plane acceptance, SAI/SDK acceptance, and explicit hardware programming acknowledgement.
  • Work with hardware teams to validate ASIC configurations, link bring-up, SerDes tuning, buffer profiles, and performance baselines.
  • Evaluate switch silicon SDK releases, track vendor deliverables, and validate platform requirements with vendors and ASIC partners.
  • Debug complex issues spanning kernel drivers, platform monitoring, NOS services, routing agents, orchestration services, hardware signals, ASIC state, and network topology.
  • Integrate switches into fleet-wide monitoring, remote diagnostics, telemetry pipelines, and automated lifecycle workflows.
  • Develop robust CI/build pipelines for reproducible NOS builds and controlled rollout across the fleet.
  • Support factory bring-up and qualification all the way through mass deployment.
  • Collaborate on networking protocols and technologies that improve performance and reliability at AI factory scale.
You might thrive in this role if you have
  • Proven experience working with SONiC or comparable NOS stacks such as FBOSS, Cumulus Linux, Arista EOS, Junos PFE-level integration, or equivalent platform software.
  • Strong software engineering fundamentals: clear interfaces, data models, state-machine design, error handling, testing, observability, performance debugging, and maintainable C/C++, Python, Go or Rust code.
  • Experience with Linux kernel internals, network device drivers, platform drivers, hwmon, I2C/SMBus, CPLDs, or board-level platform software.
  • Experience integrating or debugging Broadcom, Marvell, NVIDIA, Intel, or comparable switch ASIC SDKs and SAI implementations.
  • Understanding of L2/L3 forwarding, ECMP, RoCE, BGP, QoS, PFC, buffer tuning, and telemetry.
  • Experience with platform bring-up and board-level debugging across thermal, fan, power, transceiver, LED, watchdog, CPLD, or OSFP CMIS flows.
  • Experience with OpenConfig gNMI interfaces, YANG data models, or structured telemetry is helpful.
  • Familiarity with CI/CD pipelines, distributed config and state management, reproducible builds, and large-scale automation.
  • Ability to independently drive ambiguous NOS or platform feature development from problem definition through implementation, validation, rollout, and debugging across software, hardware, and vendor boundaries.
  • Familiarity with Rust or Go is a plus.

To comply with U.S. export control laws and regulations, candidates for this role may need to meet certain legal status requirements as provided in those laws and regulations.

About OpenAI

OpenAI is an artificial intelligence research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company was founded in 2015 by a group of technology leaders, including Elon Musk, Sam Altman, Greg Brockman, Ilya Sutskever, and John Schulman. OpenAI's mission is to develop and promote friendly AI for the betterment of humanity. The company has developed a number of cutting-edge AI technologies, including GPT-3, a language processing system that can generate human-like text. OpenAI has received funding from a number of high-profile investors, including LinkedIn co-founder Reid Hoffman and venture capitalist Peter Thiel.
Learn more about OpenAI
Size
100 employees
Industry
Founded
2015

Similar Jobs

More Jobs at OpenAI

More Information Technology Jobs

Find similar Networking Operating System Firmware Engineer jobs: