Senior Inference Engineer

DEEPREC.AI

• $130K — $180K *

Palo Alto, CA 94303In-Person

Consumer Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of engineering experience in inference acceleration and model deployment.
Proven expertise in inference optimization, including quantization and attention acceleration.
Deep knowledge of GPU programming with CUDA and NCCL.
Familiarity with video generation models and large language models (LLMs).
Strong cross-discipline communication skills for collaboration.

Responsibilities

Lead and implement advanced inference acceleration techniques for efficient model serving.
Engineer and optimize GPU strategies for maximal accuracy and scalability.
Develop and optimize high-performance computing kernels and distributed workloads.
Collaborate with teams to bring video generation and large language models into production.
Contribute to improvements in model training speed and resource utilization.
Drive code reviews and mentor engineers on best practices in GPU programming.

Benefits

Equity in a fast-growing company driving innovation in generative AI.
Comprehensive health benefits and monthly stipends.
Company retreats promoting team collaboration.
A collaborative culture emphasizing teamwork and collective success.

Full Job Description

Senior Inference Engineer AI Video Generation Company (Stealth) | Palo Alto, CA | Hybrid
About the Role We are seeking a Senior Inference Engineer to accelerate the performance of our AI-driven video generation products. In this highly technical role, you will operate at the intersection of cutting-edge inference acceleration, GPU parallelism, advanced model deployment, and video generation technologies. Your expertise will drive significant improvements to model speed and efficiency, ensuring our creative AI systems deliver industry-leading user experiences at scale.

You will design and optimize inference pipelines, implement state-of-the-art acceleration techniques, and work closely with researchers and engineers across the team to push the boundaries of what's possible in real-time AI deployment. Your efforts will play a foundational role in powering the next generation of our video and language models.
What You'll Do

Accelerate Inference: Lead and implement advanced inference acceleration techniques, including attention optimization and quantization for efficient model serving.
Maximize GPU Parallelism: Engineer and optimize GPU strategies across tensor, sequence, and pipeline parallelism (TP, SP, PP) for maximal efficiency and scalability.
Programming for Performance: Develop and optimize high-performance computing kernels and distributed workloads using CUDA and NCCL.
Advance AI Deployment: Collaborate with research and engineering teams to bring state-of-the-art video generation and large language models into production.
Improve Training Efficiency: Contribute to improvements in model training speed, stability, and resource utilization as part of our deployment lifecycle. (Bonus)
Technical Excellence: Drive rigorous code reviews, participate in technical discussions, and mentor fellow engineers on best practices in inference and GPU programming.

What We're Looking For

Experience: 5 years of engineering experience, with a strong track record in inference acceleration and model deployment at scale.
Inference Mastery: Proven expertise in inference optimization, including quantization, attention acceleration, and deep learning compiler stacks.
GPU and Parallelism: Deep knowledge of GPU programming (CUDA, NCCL) and experience with SP, TP, PP, and other forms of parallelism for distributed inference.
AI Domain Knowledge: Familiarity with video generation models and large language models (LLMs).
Collaboration: Strong cross-discipline communication skills; able to drive shared goals across research and engineering functions.
Ownership Mindset: Self-driven, solutions-oriented, and capable of managing ambiguity in a fast-paced startup environment.

Nice to Have

Experience with high-throughput video or real-time streaming model deployment.
Familiarity with distributed training and optimization toolkits.
Contributions to open source projects in AI infrastructure or deep learning compilers.
Startup or rapid prototyping experience.

What We Offer

Competitive salary commensurate with AI industry benchmarks.
Equity in a fast-growing company shaping the future of generative AI.
Comprehensive health benefits, monthly stipends, and company retreats.
A collaborative, in-office culture focused on building and shipping together.

* Ladders Estimates

Similar Jobs

Principal AI Platform Engineer
$117K — $206K *
Datasite
Remote
Today
AI Engineer
$120K — $160K *
Teserac, Inc.
Sunnyvale, CA 94087 (Santa Clara County)
Today
AI Engineer - Database Engineering
$130K — $180K *
Snowflake Computing
Menlo Park, CA 94025 (San Mateo County)
Reposted Today
Forward Deployed Engineer, AI Enablement
$120K — $150K *
STORD
Remote
Reposted Today
Sr SW Engineer, AI
$120K — $150K *
hireVouch
Remote
Today
Senior Machine Learning Engineer, Agentic Systems - Moveworks
$130K — $180K *
ServiceNow
Mountain View, CA 94040 (Santa Clara County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at DEEPREC.AI

Senior Inference Engineer
$130K — $180K *
Palo Alto, CA 94303 (Santa Clara County)
Today
Consumer Technology
In-Person
Machine Learning Engineer (Inference Optimization)
$120K — $150K *
Philadelphia, PA 19120 (Philadelphia County)
Today
Information Technology
In-Person
Staff Software Engineer (AI Infrastructure)
$130K — $180K *
Palo Alto, CA 94303 (Santa Clara County)
Today
Enterprise Technology
In-Person
Senior ASR Engineer
$200K — $250K *
San Francisco, CA 94112 (San Francisco County)
2 weeks ago
Healthcare
In-Person
MLE SpeechLLM Evaluations
$250K — $350K *
San Francisco, CA 94112 (San Francisco County)
2 weeks ago
Consumer Technology
In-Person

More Consumer Technology Jobs

Director of Paid Media & Funnels
$90K — $130K *
Hadley Designs
Remote
Reposted Today
Next Insurance - Engineering Manager
$201K — $250K *
Beyond SOF
Palo Alto, CA 94303 (Santa Clara County)
Today
Design Engineer
$80K — $120K *
vvd
Remote
Reposted Today
GPU Top RTL Designer
$130K — $180K *
Apple
Austin, TX 78745 (Travis County)
Reposted Today
Senior Product Manager, Customer Experience Products - Amazon Customer Service
$152K — $206K *
Amazon
Santa Clara, CA 95051 (Santa Clara County)
Today

Find similar Senior Inference Engineer jobs:

Nationwide Palo Alto, CA

Senior Inference Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Inference Engineer jobs:

Get Ready For Your
Next Interview