Senior Deep Learning Framework Communications Engineer

NVIDIA Corporation • $152K — $287K *

Austin, TX 78745In-Person

Information Technology

5 - 7 years of experience

More than 3 months ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

B.S., M.S., or PhD in Computer Science or related field.
5+ years of software engineering experience in HPC/AI.
Development or integration experience with Deep Learning frameworks like PyTorch and JAX.
Proficiency in Python, C++, or CUDA for rapid prototyping.
Experience with performance benchmarking on AI clusters.
Understanding of HPC/AI communication concepts.
Adaptable and passionate about learning new tools.

Responsibilities

Integrate new communication library features into AI frameworks.
Analyze AI workloads to identify multi-GPU communication needs.
Enhance AI compilers for optimized communication and automatic fusion.
Conduct performance characterization of AI workloads on multi-GPU systems.
Design fault-tolerant solutions for large-scale AI workloads.
Create custom kernels to optimize performance on NVIDIA platforms.
Influence future direction of communication libraries like NCCL and NVSHMEM.

Benefits

Equity opportunities.
Comprehensive healthcare benefits.
Flexible working hours and remote work options.
Access to professional development resources.
Diversity and inclusion programs.

Full Job Description

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.

We are looking for a motivated Deep Learning engineer to bring advanced communication technologies into AI stacks, including PyTorch, TRT-LLM, vLLM, SGLang, JAX, etc. You will be working with the team that created communication libraries like NCCL, NVSHMEM & technology like GPUDirect -- for scaling Deep Learning and HPC applications. Your customers will have diverse multi-GPU demands, ranging from training on scales up to 100K GPUs to inference down at microsecond latency. Communication performance between the GPUs has a direct impact on AI applications. Your work in AI toolkits will make all of those easier for the community. This is an outstanding opportunity for someone with an AI background to advance the state of the art in this space. Are you ready to contribute to the development of innovative technologies and help realize NVIDIA's vision?

What you will be doing:

Integrate new communication libraries features in AI frameworks: from PoC to performance analysis to production
Perform deep analysis of AI workloads and frameworks to identify multi-GPU communication requirements and opportunities. Collaborate hands-on with teams working on the latest AI models.
Improve AI compilers to hide communications or perform automatic fusion.
Conduct in-depth AI workload performance characterization on multi-GPU clusters.
Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads.
Author custom communication or fused compute-communication kernels to showcase ultimate performance on NV platforms.
Influence the roadmap of communication libraries - NCCL & NVSHMEM.
Collaborate with a very dynamic team across multiple time zones.

What we need to see:

B.S, M.S. or PHD in Computer Science, or related field (or equivalent experience) with 5+ software engineering and HPC/AI experience
Development or integration experience with Deep Learning Frameworks such PyTorch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang
Rapid prototyping and development with Python, C++, CUDA or related DSLs (Triton, cuTe)
Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g. torch.compile)
Experience conducting performance benchmarking on AI clusters. Familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems)
Understanding of HPC/AI communication concepts (1-sided v 2-sided communication, elasticity, resiliency, topology discovery, etc)
Adaptability and passion to learn new areas and tools
Flexibility to work and communicate effectively across different teams and timezones

Ways to stand out from the crowd:

Experience with parallel programming on at least one communication runtime (NCCL, NVSHMEM, MPI). Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals)
Expertise in one or more of these areas: Training, Distributed inference, MoE, Reinforcement Learning, kernel authoring (on CUDA, Triton, cuTe, etc). Experience with programming for compute & communication overlap in distributed runtimes
Experience with AI compiler pattern matching and lowering. Solid understanding of memory hierarchy, consistency model, and tensor layout

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until January 26, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

About NVIDIA Corporation

Nvidia, a global leader in graphics, gaming, and AI technology, offers Nvidia careers and internship opportunities for those passionate about driving innovation in the tech industry. you'll find a company committed to growth, teamwork, and leadership in computer science and machine learning domains.

About Nvidia

A Pioneer in Technology and Innovation

Nvidia has cemented its reputation as a powerhouse in developing advanced graphics processing units (GPUs) and has significantly contributed to the gaming industry's evolution. Moreover, its foray into AI and machine learning has opened new frontiers in technology, making Nvidia a beacon of innovation and a desirable workplace for ambitious tech professionals.

Job Opportunities

Diverse Positions in a Dynamic Field

Nvidia is continuously on the lookout for talented individuals across various domains, including hardware and software engineering, product design, marketing, and sales. Employment opportunities at Nvidia are vast, catering to a wide range of expertise and career aspirations.

Employment in Hardware and Graphics

For those fascinated by the intricacies of hardware and graphics technology, Nvidia offers positions that sit at the forefront of gaming and computing advancements.

Growth in Machine Learning and AI

Nvidia's leadership in AI and machine learning has created numerous vacancies for specialists eager to contribute to groundbreaking projects.

Recruitment in Computer Science

With the constant demand for innovation, Nvidia's recruitment efforts focus on computer science experts capable of pushing the boundaries of what's possible.

Internship Program

Opening Doors to Future Innovators

Nvidia's internship program is designed to nurture the next generation of technology leaders, offering hands-on experience in a culture that celebrates creativity and teamwork.

Benefits and Culture

Interns at Nvidia enjoy a plethora of benefits, from competitive stipends to mentorship opportunities, all within an environment that values growth and learning.

Opportunities for Students

Whether you're an undergraduate, a master's student, or a Ph.D. candidate, Nvidia's internships provide a real-world glimpse into the tech industry, offering valuable experience in various technology fields.

Pathways to Full-Time Employment

Many interns have transitioned into full-time positions, marking the start of successful careers at Nvidia. The internship program is more than a stepping stone into the company; it’s an investment in the professional development of interns. The goal is to ensure that interns are well-equipped for future challenges.

Nvidia Careers: More Than Just a Job

Nvidia offers more than just a job to its employees; it provides a front-row seat on the journey into the future of technology. Nvidia stands as a pillar of innovation with its vast opportunities in hardware, graphics, gaming, machine learning, and computer science. Nvidia careers serve as a launching pad for talented workers who aim to redefine the technological landscape. Whether through full-time positions or internships, joining Nvidia means contributing to a legacy of breakthroughs and becoming part of a global community dedicated to pushing the boundaries of what's possible.

Learn more about NVIDIA Corporation

Size

22,473 employees

Market Cap

$350.4 billion

Industry

Manufacturing & Automotive

Net Income

$4.3 billion

Founded

1993

5 Year Trend

+31.3%

Revenue

$16.6 billion

NASDAQ

NVDA

* Ladders Estimates

Similar Jobs

AI Engineer - Developer Productivity
$120K — $276K *
Hewlett Packard Enterprise Development LP
Spring, TX 77379 (Harris County)
Today
AI Engineer - Developer Productivity
$120K — $276K *
Hewlett Packard Enterprise Development LP
Spring, TX 77379 (Harris County)
Today
Artificial Intelligence (AI) Engineer
$142K — $158K *
General Dynamics
Remote
Today
Artificial Intelligence (AI) Engineer
$142K — $158K *
General Dynamics
Remote
Today
Senior Machine Learning Engineer - Generative AI & Full-Stack Applications
$83K — $222K *
CVS Health
Remote
Reposted Today
Data Scientist - Gen AI ML - Tampa/Irving/ Mississauga
$56K — $196K *
Photon
Irving, TX 75061 (Dallas County)
Reposted Yesterday

Get Ready For Your
Next Interview

More Jobs at NVIDIA Corporation

Manager, Solutions Architecture - Retail
$224K — $431K *
Santa Clara, CA 95051 (Santa Clara County)
Today
Retail & Consumer Goods
In-Person
Developer Relations Manager, AI Platform and Tools - Conversational and Generative Media
$152K — $287K *
Santa Clara, CA 95051 (Santa Clara County)
Today
Information Technology
In-Person
Developer Relations Manager, AI Platform and Tools - Conversational and Generative Media
$152K — $287K *
Remote
Today
Information Technology
Remote in Santa Clara, CA
Software Engineer, TensorRT Specialized Platforms - New College Grad 2025
$124K — $195K *
Santa Clara, CA 95051 (Santa Clara County)
Reposted Today
Information Technology
In-Person
Principal Architect, System Software - Orbital Data Center
$272K — $431K *
Santa Clara, CA 95051 (Santa Clara County)
Today
Aerospace & Defense
In-Person

More Information Technology Jobs

Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
6 days ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Customer Support
Confidential Company
Austin, TX 78701 (Travis County)
2 weeks ago
Sr Assoc, Cyber Sec ThreatMgmt - Detection Engineer
$88K — $151K *
Northern Trust
Naperville, IL 60540 (Dupage County)
Today
Global Director – Vulnerability Management & Security Configuration
$164K — $288K *
Northern Trust
Chicago, IL 60629 (Cook County)
Today

Find similar Senior Deep Learning Framework Communications Engineer jobs:

Nationwide Austin, TX