Research Scientist / Engineer - Performance Optimization

Gem.com • $187K — $395K *

San Francisco, CA 94112In-Person

Enterprise Technology

Less than 5 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Expert-level proficiency in Triton and CUDA programming for GPU optimization
Strong skills in PyTorch development and custom operations
Experience with profiling tools such as NVIDIA Nsight and torch profiler
Deep understanding of transformer architectures and attention mechanisms
Preferred experience with compilers like torch.compile, TensorRT, and ONNX
Preferred experience optimizing inference workloads for latency and throughput
Preferred knowledge of warp-level intrinsics and advanced CUDA optimization

Responsibilities

Profile and optimize GPU/CPU/Accelerator code for efficiency
Write high-performance code using PyTorch, Triton, and CUDA
Develop fused kernels and leverage modern hardware features
Optimize model implementations for scalable, multi-node deployment
Build performance monitoring tools and automation
Research and implement optimization techniques for transformer models

Benefits

Collaborative work environment with research and engineering teams
Opportunity to work on cutting-edge AI technology
Focus on maximizing performance and efficiency of AI models
Engagement in the latest optimization techniques
Hands-on experience with modern hardware platforms

Full Job Description

About the Role

The Performance Optimization team at Luma is dedicated to maximizing the efficiency and performance of our AI models. Working closely with both research and engineering teams, this group ensures that our cutting-edge multimodal models can be trained efficiently and deployed at scale while maintaining the highest quality standards.

Responsibilities

Profile and optimize GPU/CPU/Accelerator code for maximum utilization and minimal latency
Write high-performance PyTorch, Triton, CUDA, deferring to custom PyTorch operations if necessary
Develop fused kernels and leverage tensor cores and modern hardware features for optimal hardware utilization on different hardware platforms
Optimize model architectures and implementations for distributed multi-node production deployment
Build performance monitoring and analysis tools and automation
Research and implement cutting-edge optimization techniques for transformer model

Experience

Expert-level proficiency in Triton/CUDA programming and GPU optimization
Strong PyTorch skills
Experience with PyTorch kernel development and custom operations
Proficiency with profiling tools (NVIDIA Nsight, torch profiler, custom tooling)
Deep understanding of transformer architectures and attention mechanisms
(Preferred) Experience with compilers/exporters such as torch.compile, TensorRT, ONNX, XLA
(Preferred) Experience optimizing inference workloads for latency and throughput
(Preferred) Experience with Triton compiler and kernel fusion techniques
(Preferred) Knowledge of warp-level intrinsics and advanced CUDA optimization

Your applications are reviewed by real people.

Compensation

The base pay range for this role is $187,500 - $395,000 per year.

About Gem.com

Learn more about Gem.com

Industry

Enterprise Technology

Founded

2013

* Ladders Estimates

Similar Jobs

Senior Software Development Engineer
$83K — $222K *
CVS Health
Remote
Reposted Today
Software Engineer, Neural Graphics Developer Tools
$152K — $287K *
NVIDIA Corporation
Santa Clara, CA 95051 (Santa Clara County)
Reposted Today
(USA) Software Engineer III
$117K — $234K *
Walmart
Sunnyvale, CA 94087 (Santa Clara County)
Reposted Today
(USA) Software Engineer III
$117K — $234K *
Walmart, Inc.
Remote
Today
Senior Full Stack Engineer, Core Services
$160K — $210K *
Ellipsis Health, Inc
San Francisco, CA 94112 (San Francisco County)
Today
Software Engineer, Senior
$150K — $210K *
Charta Health
San Francisco, CA 94112 (San Francisco County)
Today

Get Ready For Your
Next Interview

More Jobs at Gem.com

Research Scientist / Engineer - Reinforcement Learning Infrastructure
$187K — $395K *
San Francisco, CA 94112 (San Francisco County)
Yesterday
Enterprise Technology
In-Person
Research Scientist / Engineer - Reinforcement Learning Infrastructure
$187K — $395K *
Redwood City, CA 94061 (San Mateo County)
Yesterday
Information Technology
In-Person
Account Executive - Entertainment
$100K — $150K *
Los Angeles, CA 90011 (Los Angeles County)
1 week ago
Media
Hybrid
Software Engineer - Product
$170K — $290K *
Redwood City, CA 94061 (San Mateo County)
1 week ago
Enterprise Technology
Hybrid
Software Engineer - Product
$170K — $290K *
San Francisco, CA 94112 (San Francisco County)
1 week ago
Enterprise Technology
In-Person

More Enterprise Technology Jobs

AI Solutions Architect, Director
$130K — $180K *
Elliot Davis
Remote
Today
Principal Cloud Engineer
$133K — $200K *
NiSource
Merrillville, IN 46410 (Lake County)
Today
Solutions Architect (Enterprise Platforms)
$158K — $269K *
ICF Next
Reston, VA 20191 (Fairfax County)
Today
Senior Appian Plugin Developer- Remote
$98K — $167K *
ICF Next
Remote
Today
Associate, Application Engineer - Atlanta
$120K — $148K *
BlackRock, Inc
Atlanta, GA 30349 (Fulton County)
Reposted Today

Find similar Research Scientist / Engineer - Performance Optimization jobs:

Nationwide San Francisco, CA

Research Scientist / Engineer - Performance Optimization

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Research Scientist / Engineer - Performance Optimization jobs:

Get Ready For Your
Next Interview