Principal Software Engineer, Rack-Scale System Software - CSP Engagements

NVIDIA Corporation • $272K — $431K *

US-AnywhereRemote in United States

Enterprise Technology

11 - 15 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

15+ years of experience in system software, platform firmware, or large-scale distributed systems engineering
BS or MS in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Deep understanding of rack-scale system software challenges, including multi-component coordination and health monitoring
Experience with fabric management software and system-level orchestration frameworks
Understanding of error handling and recovery design patterns in distributed systems
Experience with health monitoring and telemetry systems for fleet-level observability
Strong communication skills to mentor customer engineering teams

Responsibilities

Drive rack-scale SW/FW architecture alignment across CSP engagements
Lead technical work streams with CSP engineering teams on rack-scale system software
Capture and synthesize CSP engineering feedback on system software
Collaborate with multi-functional teams to align customer operational requirements
Identify and address cross-CSP patterns in rack-scale SW/FW issues
Collaborate on left-shift strategy for SW/FW integration work
Make critical technical decisions on SW/FW trade-offs and mitigate execution risks

Benefits

Eligibility for equity participation
Comprehensive benefits package
Support for ongoing professional development
Flexible working arrangements
Inclusive workplace culture

Full Job Description

We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for rack-scale system SW/FW, working with CSP engineering teams to ensure they can deploy, monitor, and operate these systems reliably at fleet scale. In this role, you will collaborate with NVIDIA's cross-functional rack-scale system SW/FW engineering teams with dedicated CSP-facing technical leadership. Your focus is on the system-level software that manages, monitors, and recovers the rack as a whole - fabric management, GPU/NVSwitch error handling and recovery, health telemetry APIs, firmware update orchestration, and SW-driven serviceability. You will drive work streams with CSP engineering teams to build shared understanding of the architecture, incorporate their operational feedback, and ensure integration readiness.

What you'll be doing:

Drive rack-scale SW/FW architecture alignment across CSP engagements - including fabric management software, link health monitoring, GPU/NVSwitch error handling, SW/FW serviceability features (e.g., hot-plug support, component isolation, firmware-driven recovery), and multi-component firmware orchestration
Drive technical work streams with CSP engineering teams on rack-scale system software - ensuring they deeply understand fabric management, NVSwitch behavior, error handling and recovery policies, health telemetry APIs, and SW/FW-controlled recovery operation
Capture and synthesize CSP engineering feedback on rack-scale system software - health monitoring APIs, SW-driven serviceability workflows, firmware update orchestration, and error recovery behavior - champion that feedback into NVIDIA's architecture decisions
Collaborate with multi-functional teams to ensure customer operational requirements are reflected in system software and firmware development
Identify cross-CSP patterns in rack-scale SW/FW issues, error handling behavior, and system configuration practices - drive documentation, tooling, and test strategy improvements as a result
Collaborate with execution teams on left-shift strategy - ensuring customer-side SW/FW integration work is identified early and completed ahead of hardware availability
Make critical technical decisions on rack-scale system SW/FW tradeoffs and mitigate execution risks through early engagement with CSP engineering teams

What we need to see:

15+ years of experience in system software, platform firmware, or large-scale distributed systems engineering. BS or MS in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Deep understanding of rack-scale system software challenges: multi-component coordination, error propagation, health monitoring, and serviceability / reliability
Experience with fabric management software, cluster management, or system-level orchestration frameworks. Familiarity with firmware architectures and update lifecycle management (multi-component update sequencing, rollback, recovery)
Understanding of error handling and recovery design patterns in distributed systems - fault isolation, retry policies, graceful degradation
Experience with health monitoring and telemetry systems: health scoring, event correlation, API design for fleet-level observability
Understanding of GPU or accelerator system software (drivers, device management, power management) is a strong plus
Customer obsession - genuine passion for understanding how CSPs operate sophisticated systems at fleet scale and simplifying their experience
Proven success providing technical leadership across organizational boundaries and influencing system software design without direct authority. Strong communication - ability to translate complex system software architecture into actionable mentorship for customer engineering teams

Ways to stand out from the crowd:

Experience with NVIDIA NVSwitch, NVOS, or GPU fabric management software
Background in system software for large-scale clusters at a hyperscaler (cluster management, fleet orchestration, health platforms)
Experience crafting error handling and recovery frameworks for multi-component systems (hundreds or thousands of coordinating devices)
Familiarity with GPU or accelerator fleet operations - driver lifecycle, firmware rollout strategies, health-based scheduling
Understanding of how system software decisions impact serviceability, availability, and operational cost at fleet scale

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until June 30, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

About NVIDIA Corporation

Nvidia, a global leader in graphics, gaming, and AI technology, offers Nvidia careers and internship opportunities for those passionate about driving innovation in the tech industry. you'll find a company committed to growth, teamwork, and leadership in computer science and machine learning domains.

About Nvidia

A Pioneer in Technology and Innovation

Nvidia has cemented its reputation as a powerhouse in developing advanced graphics processing units (GPUs) and has significantly contributed to the gaming industry's evolution. Moreover, its foray into AI and machine learning has opened new frontiers in technology, making Nvidia a beacon of innovation and a desirable workplace for ambitious tech professionals.

Job Opportunities

Diverse Positions in a Dynamic Field

Nvidia is continuously on the lookout for talented individuals across various domains, including hardware and software engineering, product design, marketing, and sales. Employment opportunities at Nvidia are vast, catering to a wide range of expertise and career aspirations.

Employment in Hardware and Graphics

For those fascinated by the intricacies of hardware and graphics technology, Nvidia offers positions that sit at the forefront of gaming and computing advancements.

Growth in Machine Learning and AI

Nvidia's leadership in AI and machine learning has created numerous vacancies for specialists eager to contribute to groundbreaking projects.

Recruitment in Computer Science

With the constant demand for innovation, Nvidia's recruitment efforts focus on computer science experts capable of pushing the boundaries of what's possible.

Internship Program

Opening Doors to Future Innovators

Nvidia's internship program is designed to nurture the next generation of technology leaders, offering hands-on experience in a culture that celebrates creativity and teamwork.

Benefits and Culture

Interns at Nvidia enjoy a plethora of benefits, from competitive stipends to mentorship opportunities, all within an environment that values growth and learning.

Opportunities for Students

Whether you're an undergraduate, a master's student, or a Ph.D. candidate, Nvidia's internships provide a real-world glimpse into the tech industry, offering valuable experience in various technology fields.

Pathways to Full-Time Employment

Many interns have transitioned into full-time positions, marking the start of successful careers at Nvidia. The internship program is more than a stepping stone into the company; it’s an investment in the professional development of interns. The goal is to ensure that interns are well-equipped for future challenges.

Nvidia Careers: More Than Just a Job

Nvidia offers more than just a job to its employees; it provides a front-row seat on the journey into the future of technology. Nvidia stands as a pillar of innovation with its vast opportunities in hardware, graphics, gaming, machine learning, and computer science. Nvidia careers serve as a launching pad for talented workers who aim to redefine the technological landscape. Whether through full-time positions or internships, joining Nvidia means contributing to a legacy of breakthroughs and becoming part of a global community dedicated to pushing the boundaries of what's possible.

Learn more about NVIDIA Corporation

Size

22,473 employees

Market Cap

$350.4 billion

Industry

Manufacturing & Automotive

Net Income

$4.3 billion

Founded

1993

5 Year Trend

+31.3%

Revenue

$16.6 billion

NASDAQ

NVDA

* Ladders Estimates

Similar Jobs

Principal Software Engineer, E2E Performance and Goodput - CSP Engagements
$272K — $431K *
NVIDIA Corporation
Austin, TX 78745 (Travis County)
Today
Principal Software Engineer, E2E Performance and Goodput - CSP Engagements
$272K — $431K *
NVIDIA Corporation
Remote
Today
Principal Software Engineer, E2E Performance and Goodput - CSP Engagements
$272K — $431K *
NVIDIA Corporation
Santa Clara, CA 95051 (Santa Clara County)
Today
Principal Software Engineer
$204K — $337K *
MasterCard
New York, NY 10025 (New York County)
Reposted Today
Principal Software Engineer, Rack-Scale System Software - CSP Engagements
$272K — $431K *
NVIDIA Corporation
Austin, TX 78745 (Travis County)
Today
Principal Software Engineer, Rack-Scale System Software - CSP Engagements
$272K — $431K *
NVIDIA Corporation
Santa Clara, CA 95051 (Santa Clara County)
Today

Get Ready For Your
Next Interview

More Jobs at NVIDIA Corporation

CUDA Libraries and Frameworks Product Marketing Manager
$124K — $230K *
Santa Clara, CA 95051 (Santa Clara County)
Today
Consumer Technology
In-Person
Principal Software Engineer, Rack-Scale System Software - CSP Engagements
$272K — $431K *
Austin, TX 78745 (Travis County)
Today
Enterprise Technology
In-Person
Principal Software Engineer, Rack-Scale System Software - CSP Engagements
$272K — $431K *
Remote
Today
Enterprise Technology
Remote in United States
Principal Software Engineer, Rack-Scale System Software - CSP Engagements
$272K — $431K *
Santa Clara, CA 95051 (Santa Clara County)
Today
Enterprise Technology
In-Person
Senior Solutions Architect, AI Infrastructure
$184K — $356K *
Santa Clara, CA 95051 (Santa Clara County)
Reposted Today
Enterprise Technology
In-Person

More Enterprise Technology Jobs

Cybersecurity Sales Specialist
$200K — $300K *
Hewlett Packard Enterprise Development LP
Mississauga, ON L4T 0A1
Reposted Today
Engagement Director 3
$100K — $125K *
Genpact Limited
New York, NY 10025 (New York County)
Today
Sales - Modernization & Transformation - GrowthX
$150K — $226K *
DXC Technology
Chicago, IL 60629 (Cook County)
Today
Principal Product Manager, Adobe Express
$148K — $282K *
Adobe Inc.
San Francisco, CA 94112 (San Francisco County)
Today
AI Solutions and Adoption Lead
$147K — $230K *
HP Development Company, L.P.
Washington, DC 20011 (District Of Columbia County)
Reposted Today