NVIDIA Corporation

Director, Site Reliability and Software Engineering - DGX Cloud

NVIDIA Corporation$320K — $500K+*
Enterprise Technology
11 - 15 years of experience
Job Overview by Ladders

Qualifications

  • 12+ years in engineering management.
  • 5+ years of leadership experience.
  • Bachelor's or Master's in Computer Science or equivalent.
  • Experience with large-scale distributed systems and cluster solutions.
  • Strong Unix/Linux proficiency.
  • Demonstrated ability in mentoring and coaching teams.
  • Experience with Containers and virtualization environments.

Responsibilities

  • Manage a team of Software and Site Reliability engineers, overseeing development and planning.
  • Define team strategy and roadmap for the DGX Cloud Computing environment.
  • Drive leadership in technical projects within a fast-paced setting.
  • Ensure planning, tracking, and the success of technical projects.
  • Collaborate with product management for top-tier product development.
  • Contribute technically to DGX Cloud Computing projects.
  • Provide operational and financial clarity to internal stakeholders.

Benefits

  • Eligibility for equity in the company.
  • Access to various corporate benefits programs.
Full Job Description
What you'll be doing:

As a Site Reliability and Software Engineering leader in the DGXC Cloud Reliability organization, you will manage the software, automation, and operations of the multi-colo distributed NVIDIA GPU cloud clusters and contribute to product strategy. You will be the leader for all aspects of cluster automation and operational excellence planning and grow your team. You thrive in a fast-paced iterative engineering environment and have experience delivering scalable distributed systems. Most importantly, you will have a track record of having past teams and cross-functional partners respect you as both a technical leader and manager, and are able to work via influence and not direct authority when needed. NVIDIA GPU Cloud Computing team works with customers across the entire company, and the ability to work across multiple different levels of technical and organizational leadership is critical. Operating with scale and speed, our world-class software engineers are just getting started -- and as a leader, you guide the way to solve reliability both our internally critical and our externally-visible systems.
  • Manage a team of Software and Site Reliability engineers, including program development, task planning and code reviews.
  • Define team strategy and roadmap, and drive adoption of scalable SDLC practices, test infrastructure, and modern practices Nvidia's DGX Cloud Computing environment.
  • Drive technical projects and provide leadership in an innovative and fast-paced environment.
  • Be responsible for the overall planning, tracking and success of technical projects.
  • Work closely with project and product management teams to ensure best-in-class product development.
  • Contribute technically to the technical projects for DGX Cloud Computing Services.
  • Interact with key internal stakeholders to provide operational and financial clarity on technical spend
  • Drive Decision making, visibility and operational rigor across business analytic initiatives such as budget and project & portfolio reporting. Lead efforts related to executive reporting, dashboards, and operational CTO metrics focusing on continuous improvement and evolution to maximize decision making and executive visibility.


What we need to see:
  • 12+ overall years of Experience in engineering management. 5+ years of leadership.
  • Bachelor / Master degree in Computer Science, or equivalent experience.
  • Experience in designing and implementing large-scale distributed systems. Experience in Containers / Virtualization environments/ Cluster solutions Experience in managing Technical Support / DevOps teams. Set appropriate technical excellent bars and deliver projects in tight deadlines.
  • Strong knowledge in Unix/Linux.
  • Experience implementing tools, process, internal instrumentation, methodologies and resolving blockages
  • Demonstrated people management and leadership skills, the proven track record of mentoring and coaching team members.
  • Ability to quickly learn and evaluate new technologies.
  • Ability to influence and establish relationships with other software and IT functional groups such as development, server, storage and security teams.


Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 320,000 USD - 488,750 USD for Level 5, and 384,000 USD - 575,000 USD for Level 6.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until June 9, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

About NVIDIA Corporation

Nvidia, a global leader in graphics, gaming, and AI technology, offers Nvidia careers and internship opportunities for those passionate about driving innovation in the tech industry. you'll find a company committed to growth, teamwork, and leadership in computer science and machine learning domains.

About Nvidia

A Pioneer in Technology and Innovation

Nvidia has cemented its reputation as a powerhouse in developing advanced graphics processing units (GPUs) and has significantly contributed to the gaming industry's evolution. Moreover, its foray into AI and machine learning has opened new frontiers in technology, making Nvidia a beacon of innovation and a desirable workplace for ambitious tech professionals.

Job Opportunities

Diverse Positions in a Dynamic Field

Nvidia is continuously on the lookout for talented individuals across various domains, including hardware and software engineering, product design, marketing, and sales. Employment opportunities at Nvidia are vast, catering to a wide range of expertise and career aspirations.

Employment in Hardware and Graphics

For those fascinated by the intricacies of hardware and graphics technology, Nvidia offers positions that sit at the forefront of gaming and computing advancements.

Growth in Machine Learning and AI

Nvidia's leadership in AI and machine learning has created numerous vacancies for specialists eager to contribute to groundbreaking projects.

Recruitment in Computer Science

With the constant demand for innovation, Nvidia's recruitment efforts focus on computer science experts capable of pushing the boundaries of what's possible.

Internship Program

Opening Doors to Future Innovators

Nvidia's internship program is designed to nurture the next generation of technology leaders, offering hands-on experience in a culture that celebrates creativity and teamwork.

Benefits and Culture

Interns at Nvidia enjoy a plethora of benefits, from competitive stipends to mentorship opportunities, all within an environment that values growth and learning.

Opportunities for Students

Whether you're an undergraduate, a master's student, or a Ph.D. candidate, Nvidia's internships provide a real-world glimpse into the tech industry, offering valuable experience in various technology fields.

Pathways to Full-Time Employment

Many interns have transitioned into full-time positions, marking the start of successful careers at Nvidia. The internship program is more than a stepping stone into the company; it’s an investment in the professional development of interns. The goal is to ensure that interns are well-equipped for future challenges.

Nvidia Careers: More Than Just a Job

Nvidia offers more than just a job to its employees; it provides a front-row seat on the journey into the future of technology. Nvidia stands as a pillar of innovation with its vast opportunities in hardware, graphics, gaming, machine learning, and computer science. Nvidia careers serve as a launching pad for talented workers who aim to redefine the technological landscape. Whether through full-time positions or internships, joining Nvidia means contributing to a legacy of breakthroughs and becoming part of a global community dedicated to pushing the boundaries of what's possible.
Learn more about NVIDIA Corporation
Size
22,473 employees
Market Cap
$350.4 billion
Industry
Net Income
$4.3 billion
Founded
1993
5 Year Trend
+31.3%
Revenue
$16.6 billion
NASDAQ

Similar Jobs

More Jobs at NVIDIA Corporation

More Enterprise Technology Jobs

Find similar Director, Site Reliability and Software Engineering - DGX Cloud jobs: