NVIDIA Corporation

Senior Systems Software Engineer, Kubernetes Node Lifecycle - DGX Cloud

NVIDIA Corporation$184K — $356K *
Enterprise Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 8 years of experience in systems software, cloud infrastructure, or Kubernetes node engineering.
  • Bachelor's or Master's degree in Engineering or equivalent experience.
  • Deep expertise in Cluster API (CAPI) development and machine lifecycle management.
  • Extensive experience with OS image build pipelines and Kubernetes node packaging.
  • Experience with bring-your-own-node models and large-scale nodepool lifecycle management.
  • Strong understanding of kubelet configuration and Kubernetes node registration.
  • Proficiency in Golang and/or Python, with cloud provider experience.

Responsibilities

  • Directly build and refine CAPI providers for NVIDIA Kubernetes Engine for consistent node provisioning.
  • Develop and maintain workflows for integrating diverse NVIDIA hardware into NKE clusters.
  • Coordinate OS image processes, ensuring compliance and performance for NVIDIA workloads.
  • Create and sustain hardening pipelines incorporating security benchmarks and remediation.
  • Develop automated test suites for validating node images prior to production deployment.
  • Manage large-scale nodepool lifecycle, including upgrades and seamless replacements.
  • Analyze and resolve node-layer faults in production NKE clusters to optimize performance.

Benefits

  • Equity participation in NVIDIA's success.
  • Access to cutting-edge AI tools in recruiting processes.
  • Work in a collaborative and innovative environment.
  • Opportunities for professional development and advancement.
  • Dynamic workplace culture that fosters growth and creativity.
Full Job Description
We are looking for a Senior Systems Software Engineer with strong experience in Kubernetes node engineering, OS image packaging, and cloud infrastructure. The ideal candidate will possess deep hyperscaler-level knowledge across the entire node lifecycle. This covers CAPI providers, bring-your-own-node onboarding, OS image build pipelines, packaging, and nodepool management. They must have the technical depth needed to maintain cluster reliability at frontier AI scale. In this vital role, you will manage the node layer within NVIDIA Kubernetes Engine (NKE). Your work will ensure it scales to fulfill DGX Cloud's two main goals: supporting internal researchers and enabling NCPs. Are you prepared to innovate? What you'll be doing: • Direct the building and refinement of CAPI providers for NVIDIA Kubernetes Engine, maintaining steady, consistent, and scalable node provisioning across DGX Cloud and NCP environments. • Develop and maintain bring-your-own-node workflows that allow customers to integrate different NVIDIA hardware into NKE clusters while ensuring high operational consistency. • Coordinate OS image generation, packaging, deployment, and update processes for NKE nodes. Ensure images are fine-tuned for NVIDIA GPU workloads and satisfy enterprise- and cloud-grade security and compliance criteria. • Develop and sustain node image hardening pipelines, incorporating CIS benchmarks, automated CVE remediation, and promotion gates connected to security posture. • Develop and maintain automated test suites for node images. These tests verify accuracy across Kubernetes versions and NVIDIA hardware configurations. This process occurs prior to production deployment and facilitates continuous validation through modern CI/CD pipelines. • Handle nodepool lifecycle at scale, including provisioning, upgrades, drain and cordon workflows, and seamless node replacement across very large clusters with diverse NVIDIA hardware. • Examine, resolve, and determine underlying causes of node-layer faults in production NKE clusters, such as those involving image configuration, driver packaging, kubelet operation, and hardware activation, and review and optimize the node layer in real-world high-scale scenarios. • Partner with upstream communities including Cluster API, Kubernetes, and CNCF projects to establish node provisioning and lifecycle standards in accordance with NKE requirements. Communicate your progress and findings at internal and external gatherings such as KubeCon and GTC. What we need to see: • 8 years of experience with a background in systems software, cloud infrastructure, or Kubernetes node engineering. • Bachelor's or Master's degree in Engineering (Electrical, Computer Engineering, Computer Science) or equivalent experience. • Deep expertise in Cluster API (CAPI), including provider development and full machine lifecycle from provisioning to deletion. • Extensive experience with OS image build pipelines, node image packaging, and delivery systems for Kubernetes nodes (for example image-builder, containerd, cloud-init, packer). • Practical experience with bring-your-own-node models and integrating diverse hardware into live Kubernetes environments, including large-scale nodepool lifecycle management and upgrades. • Strong understanding of kubelet configuration, node bootstrap, and the Kubernetes node registration lifecycle. • Experience with node image security, including vulnerability scanning, patch automation, and compliance gating as part of image build pipelines. • Proficiency in Golang and/or Python, and hands-on experience with at least one major public cloud provider (GCP, AWS, Azure, OCI or equivalent). Ways to stand out from the crowd: • Direct experience building or maintaining node image pipelines for a hyperscaler Kubernetes distribution (GKE, EKS, AKS, OKE, or equivalent). • Experience with supply chain security and hardening for node images, including image signing, provenance attestation, SBOM generation, CIS benchmark consistency, and automated CVE remediation. • Experience with automated node provisioning and optimal sizing at scale (for example Karpenter, GKE NAP or similar) and how these interact with GPU workload scheduling. • Strong operational experience working with immutable OS image distributions (such as Flatcar, Bottlerocket, Azure Linux) and debugging node-layer failures in large Kubernetes clusters. • Proven background of upstream contributions to Cluster API, Kubernetes or related CNCF projects, combined with excellent communication and interpersonal abilities. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until June 14, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes.

About NVIDIA Corporation

Nvidia, a global leader in graphics, gaming, and AI technology, offers Nvidia careers and internship opportunities for those passionate about driving innovation in the tech industry. you'll find a company committed to growth, teamwork, and leadership in computer science and machine learning domains.

About Nvidia

A Pioneer in Technology and Innovation

Nvidia has cemented its reputation as a powerhouse in developing advanced graphics processing units (GPUs) and has significantly contributed to the gaming industry's evolution. Moreover, its foray into AI and machine learning has opened new frontiers in technology, making Nvidia a beacon of innovation and a desirable workplace for ambitious tech professionals.

Job Opportunities

Diverse Positions in a Dynamic Field

Nvidia is continuously on the lookout for talented individuals across various domains, including hardware and software engineering, product design, marketing, and sales. Employment opportunities at Nvidia are vast, catering to a wide range of expertise and career aspirations.

Employment in Hardware and Graphics

For those fascinated by the intricacies of hardware and graphics technology, Nvidia offers positions that sit at the forefront of gaming and computing advancements.

Growth in Machine Learning and AI

Nvidia's leadership in AI and machine learning has created numerous vacancies for specialists eager to contribute to groundbreaking projects.

Recruitment in Computer Science

With the constant demand for innovation, Nvidia's recruitment efforts focus on computer science experts capable of pushing the boundaries of what's possible.

Internship Program

Opening Doors to Future Innovators

Nvidia's internship program is designed to nurture the next generation of technology leaders, offering hands-on experience in a culture that celebrates creativity and teamwork.

Benefits and Culture

Interns at Nvidia enjoy a plethora of benefits, from competitive stipends to mentorship opportunities, all within an environment that values growth and learning.

Opportunities for Students

Whether you're an undergraduate, a master's student, or a Ph.D. candidate, Nvidia's internships provide a real-world glimpse into the tech industry, offering valuable experience in various technology fields.

Pathways to Full-Time Employment

Many interns have transitioned into full-time positions, marking the start of successful careers at Nvidia. The internship program is more than a stepping stone into the company; it’s an investment in the professional development of interns. The goal is to ensure that interns are well-equipped for future challenges.

Nvidia Careers: More Than Just a Job

Nvidia offers more than just a job to its employees; it provides a front-row seat on the journey into the future of technology. Nvidia stands as a pillar of innovation with its vast opportunities in hardware, graphics, gaming, machine learning, and computer science. Nvidia careers serve as a launching pad for talented workers who aim to redefine the technological landscape. Whether through full-time positions or internships, joining Nvidia means contributing to a legacy of breakthroughs and becoming part of a global community dedicated to pushing the boundaries of what's possible.
Learn more about NVIDIA Corporation
Size
22,473 employees
Market Cap
$350.4 billion
Industry
Net Income
$4.3 billion
Founded
1993
5 Year Trend
+31.3%
Revenue
$16.6 billion
NASDAQ

Similar Jobs

More Jobs at NVIDIA Corporation

More Enterprise Technology Jobs

Find similar Senior Systems Software Engineer, Kubernetes Node Lifecycle - DGX Cloud jobs: