Member of Technical Staff, Inference

Inferact

• $200K — $400K *

San Francisco, CA 94112In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor's degree or equivalent experience in computer science, engineering, or similar.
Deep understanding of transformer architectures and their variants.
Strong programming skills in Python with experience in PyTorch internals.
Experience with LLM inference systems (vLLM, TensorRT-LLM, SGLang, TGI).
Ability to read and implement model architectures and inference techniques from research papers.
Demonstrated ability to contribute performant and maintainable code and debug complex ML codebases.

Responsibilities

Optimize model execution for diverse hardware and architectures.
Develop and innovate on the inference engine for advanced AI models.
Implement and manage model architectures based on the latest research.
Collaborate with cross-functional teams to enhance AI inference capabilities.
Debug and maintain high-performance ML codebases.

Benefits

Generous health, dental, and vision benefits.
401(k) company match.

Full Job Description

About the Role

We're looking for an inference runtime engineer to push the boundaries of what's possible in LLM and diffusion model serving. Models grow larger. Architectures shift: mixture-of-experts, multimodal, agentic. Every breakthrough demands innovations on the inference engine itself. You'll work at the core of vLLM, optimizing how models execute across diverse hardware and architectures. Your work will directly impact how the world runs AI inference.

Skills and Qualifications

Minimum qualifications:

Bachelor's degree or equivalent experience in computer science, engineering, or similar.
Deep understanding of transformer architectures and their variants.
Strong programming skills in Python with experience in PyTorch internals.
Experience with LLM inference systems (vLLM, TensorRT-LLM, SGLang, TGI).
Ability to read and implement model architectures and inference techniques from research papers.
Demonstrate the ability to contribute performant and maintainable code and debug in complex ML codebases.

Preferred qualifications:

Deep understanding of KV-cache memory management, prefix caching, and hybrid model serving.
Familiarity with RL frameworks and algorithms for LLMs.
Experience with multimodal inference (audio/image/video/text).
Contributions to open-source ML or system infrastructure projects.

Bonus points if you have:

Implemented core features in vLLM or other inference engine projects.
Contributed to vLLM integrations (verl, OpenRLHF, Unsloth, LlamaFactory, etc).
Written widely-shared technical blogs or side projects on vLLM or LLM inference.

Logistics

Location: This role is based in San Francisco, California. Will consider remote in the US for exceptional candidates.
Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is $200,000 - $400,000 USD + equity.
Visa sponsorship: We sponsor visas on a case-by-case basis.
Benefits: Inferact offers generous health, dental, and vision benefits as well as 401(k) company match.

* Ladders Estimates

Similar Jobs

ML Perception Software Engineer
$125K — $222K *
Applied Intuition
Sunnyvale, CA 94087 (Santa Clara County)
Today
Software Engineer - Prediction and Behavior ML
$125K — $222K *
Applied Intuition
Sunnyvale, CA 94087 (Santa Clara County)
Today
Senior Applied AI Engineer
$250K — $400K *
Noon
San Francisco, CA 94112 (San Francisco County)
Today
Machine Learning Engineer
$160K — $225K *
MAI
Mountain View, CA 94040 (Santa Clara County)
Today
Software Engineer - AI Engineering
$126K — $250K *
Applied Intuition
Sunnyvale, CA 94087 (Santa Clara County)
Today
Research Engineer
$150K — $250K *
Helm.ai
Remote
Today

Get Ready For Your
Next Interview

More Jobs at Inferact

Member of Technical Staff, Performance and Scale
$200K — $400K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Member of Technical Staff, Inference
$200K — $400K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
5 days ago
Data Center Operations Technician
$62K — $112K *
Amazon
Boardman, OR 97818 (Morrow County)
Reposted Today
Full Stack Software Developer
$69K — $158K *
TeleTech
Norfolk, VA 23503 (Norfolk City County)
Today
Senior Full-Stack WebApp Engineer
$120K — $150K *
Level
Bellevue, WA 98006 (King County)
Today
Software Engineer II
$123K — $165K *
The Walt Disney Company
Seattle, WA 98115 (King County)
Reposted Today

Find similar Member of Technical Staff, Inference jobs:

Nationwide San Francisco, CA

Member of Technical Staff, Inference

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Member of Technical Staff, Inference jobs:

Get Ready For Your
Next Interview