Senior Systems Software Engineer, Observability and Telemetry Platform

NVIDIA Corporation • $184K — $356K *

Santa Clara, CA 95051In-Person

Information Technology

8 - 10 years of experience

Reposted Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

BS degree in Computer Science or a related field, or equivalent experience
8+ years in Infrastructure automation and distributed systems design
5+ years delivering foundational infrastructure and observability platforms
Proficiency in Python, Go, Perl, or Ruby
Deep knowledge of Linux, Networking, and Containers

Responsibilities

Design and maintain the observability and telemetry platform focusing on performance and real-time monitoring
Enhance the entire lifecycle of services from design to operational refinement
Consult on system design, develop tools/platforms, and conduct launch reviews prior to service deployment
Monitor services post-launch for availability, latency, and overall health
Implement automation to sustainably scale systems and improve reliability
Conduct blameless postmortems and sustainable incident response
Participate in on-call rotations for production system support

Benefits

Eligible for equity
Comprehensive benefits package
Opportunities for mentorship and support in career growth
Collaborative and inclusive work environment
Focus on intellectual curiosity and problem-solving culture

Full Job Description

Senior Systems Software Engineer (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demands knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. Senior Systems Software Engineer (SRE) at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. Senior Systems Software Engineer (SRE) is also a mindset and a set of engineering approaches to running better production systems and optimizations. Much of our software development focuses on eliminating manual work through automation, performance tuning and growing efficiency of production systems.

The Senior Systems Software Engineer (SRE) are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Practices such as limiting time spent on reactive operational work, blameless postmortems and proactive identification of potential outages' factor into iterative improvement that is key to both product quality and exciting dynamic day-to-day work. The Senior Systems Software Engineer (SRE) culture of diversity, intellectual curiosity, problem solving and willingness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on relevant projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow.

What you'll be doing:

Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection platform with a focus on performance at scale, real time monitoring, logging and alerting
Engage in and improve the whole lifecycle of services-from inception and design through deployment, operation and refinement
Support services before they go live through activities such as system design consulting, developing software tools, platforms and frameworks, capacity management and launch reviews
Maintain services once they are live by measuring and monitoring availability, latency and overall system health
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
Practice sustainable incident response and blameless postmortems
Be part of an on call rotation to support production systems

What we need to see:

BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience
8+ years of experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large scale private or public cloud system in Production
5+ years experience delivering foundational infrastructure and observability platforms.
Experience in one or more of the following: Python, Go, Perl or Ruby
In depth knowledge on Linux, Networking and Containers

Ways to stand out from the crowd:

Interest in crafting, analyzing and fixing large-scale distributed systems
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. Ability to debug and optimize code and automate routine tasks
Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack and Docker. Experience running Grafana, OpenTelemetry, Prometheus, and similar observability focused tools

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until June 28, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

About NVIDIA Corporation

Nvidia, a global leader in graphics, gaming, and AI technology, offers Nvidia careers and internship opportunities for those passionate about driving innovation in the tech industry. you'll find a company committed to growth, teamwork, and leadership in computer science and machine learning domains.

About Nvidia

A Pioneer in Technology and Innovation

Nvidia has cemented its reputation as a powerhouse in developing advanced graphics processing units (GPUs) and has significantly contributed to the gaming industry's evolution. Moreover, its foray into AI and machine learning has opened new frontiers in technology, making Nvidia a beacon of innovation and a desirable workplace for ambitious tech professionals.

Job Opportunities

Diverse Positions in a Dynamic Field

Nvidia is continuously on the lookout for talented individuals across various domains, including hardware and software engineering, product design, marketing, and sales. Employment opportunities at Nvidia are vast, catering to a wide range of expertise and career aspirations.

Employment in Hardware and Graphics

For those fascinated by the intricacies of hardware and graphics technology, Nvidia offers positions that sit at the forefront of gaming and computing advancements.

Growth in Machine Learning and AI

Nvidia's leadership in AI and machine learning has created numerous vacancies for specialists eager to contribute to groundbreaking projects.

Recruitment in Computer Science

With the constant demand for innovation, Nvidia's recruitment efforts focus on computer science experts capable of pushing the boundaries of what's possible.

Internship Program

Opening Doors to Future Innovators

Nvidia's internship program is designed to nurture the next generation of technology leaders, offering hands-on experience in a culture that celebrates creativity and teamwork.

Benefits and Culture

Interns at Nvidia enjoy a plethora of benefits, from competitive stipends to mentorship opportunities, all within an environment that values growth and learning.

Opportunities for Students

Whether you're an undergraduate, a master's student, or a Ph.D. candidate, Nvidia's internships provide a real-world glimpse into the tech industry, offering valuable experience in various technology fields.

Pathways to Full-Time Employment

Many interns have transitioned into full-time positions, marking the start of successful careers at Nvidia. The internship program is more than a stepping stone into the company; it’s an investment in the professional development of interns. The goal is to ensure that interns are well-equipped for future challenges.

Nvidia Careers: More Than Just a Job

Nvidia offers more than just a job to its employees; it provides a front-row seat on the journey into the future of technology. Nvidia stands as a pillar of innovation with its vast opportunities in hardware, graphics, gaming, machine learning, and computer science. Nvidia careers serve as a launching pad for talented workers who aim to redefine the technological landscape. Whether through full-time positions or internships, joining Nvidia means contributing to a legacy of breakthroughs and becoming part of a global community dedicated to pushing the boundaries of what's possible.

Learn more about NVIDIA Corporation

Size

22,473 employees

Market Cap

$350.4 billion

Industry

Manufacturing & Automotive

Net Income

$4.3 billion

Founded

1993

5 Year Trend

+31.3%

Revenue

$16.6 billion

NASDAQ

NVDA

* Ladders Estimates

Similar Jobs

Staff Systems Engineer - Device Engineering - In-person Commerce
$160K — $240K *
Fiserv
Sunnyvale, CA 94087 (Santa Clara County)
Today
Systems Engineer VI
$113K — $226K *
Abbott
Remote
Today
Senior Systems Engineer
$190K — $210K *
GRVTY
Ventura, CA 93003 (Ventura County)
Yesterday
Member of Technical Staff, Enterprise Platform Engineer
$350K — $500K *
Mirendil
San Francisco, CA 94112 (San Francisco County)
Yesterday
Member of Technical Staff, Infrastructure Engineer
$350K — $500K *
Mirendil
San Francisco, CA 94112 (San Francisco County)
Yesterday
Senior Infrastructure Engineer - Certification Authority
$167K — $201K *
Fastly
San Francisco, CA 94112 (San Francisco County)
Yesterday

Get Ready For Your
Next Interview

More Jobs at NVIDIA Corporation

Principal Engineer - Medical Imaging Reconstruction and Raw-to-Insights AI
$272K — $431K *
Santa Clara, CA 95051 (Santa Clara County)
Today
Healthcare
In-Person
Forward Deployed Architect
$224K — $431K *
Santa Clara, CA 95051 (Santa Clara County)
Today
Enterprise Technology
In-Person
Senior Debug System Engineer, Datacenter
$200K — $322K *
Santa Clara, CA 95051 (Santa Clara County)
Today
Information Technology
In-Person
Principal Engineer - Medical Imaging Reconstruction and Raw-to-Insights AI
$272K — $431K *
Redmond, WA 98052 (King County)
Today
Healthcare
In-Person
Forward Deployed Architect
$224K — $431K *
Remote
Today
Enterprise Technology
Remote in Santa Clara, CA

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
.NET Software Engineer
$69K — $158K *
TeleTech
Arlington, VA 22204 (Arlington County)
Today
Director of Information Technology
$110K *
MaineHousing
Augusta, ME 04330 (Kennebec County)
Today
Quality Assurance Lead
$90K — $120K *
The PNC Financial Services Group, Inc
Pittsburgh, PA 15237 (Allegheny County)
Reposted Today
Data and Visualization Engineer
$77K — $176K *
TeleTech
Washington, DC 20011 (District Of Columbia County)
Today

Find similar Senior Systems Software Engineer, Observability and Telemetry Platform jobs:

Nationwide Santa Clara, CA