Member of Technical Staff, Inference

Mirendil

• $350K — $500K *

San Francisco, CA 94112In-Person

Consumer Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Extensive experience with inference systems and performance optimization.
Strong understanding of GPU and accelerator hardware architecture.
Knowledge of distributed inference frameworks (e.g., vLLM, TensorRT-LLM).
Familiarity with optimization techniques like speculative decoding and quantization.
Proven ability in building observability infrastructure for complex systems.

Responsibilities

Design and build high-throughput inference serving systems for large models.
Optimize performance across GPU and accelerator hardware.
Extend and enable distributed inference frameworks for varied workloads.
Implement inference-time optimizations for enhanced model efficiency.
Establish reliability infrastructure to measure key performance metrics.
Collaborate with teams to integrate new model architectures into production.

Benefits

Meaningful equity grant based on experience and background.
Competitive benefits package.

Full Job Description

The Role

We are looking for an engineer to own the inference systems that power our models in production and research. You'll work across the full inference stack, from serving infrastructure down to hardware-level optimization. Some example areas you might work on (not limited to):

Design and build high-throughput, low-latency inference serving systems for frontier models, optimizing for both research iteration and production deployment
Optimize inference performance across GPU and accelerator hardware - maximizing FLOPs utilization, memory bandwidth, and compute efficiency for large-scale models
Enable and extend distributed inference frameworks (e.g. vLLM, SGLang, TensorRT-LLM) to support novel architectures, long-context workloads, and agentic inference patterns
Implement and validate inference-time optimizations: speculative decoding, quantization, KV cache management, and batching strategies
Build observability and reliability infrastructure so the team can measure latency, throughput, and cost across every serving configuration
Partner directly with teams to bring new model architectures and post-training techniques into production quickly

If you're excited about pushing the performance limits of frontier model inference, we'd love to hear from you.

We offer a base salary of $350,000-$500,000 USD and a meaningful equity grant, depending on experience and background, along with competitive benefits.

* Ladders Estimates

Similar Jobs

Member of Technical Staff, Post-Training, RL Infra
$350K — $500K *
Mirendil
San Francisco, CA 94112 (San Francisco County)
Today
Member of Technical Staff, Pretraining
$350K — $500K *
Mirendil
San Francisco, CA 94112 (San Francisco County)
Today

Get Ready For Your
Next Interview

More Jobs at Mirendil

Member of Technical Staff, Post-Training, RL Infra
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Member of Technical Staff, Post-Training, RL
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Member of Technical Staff, Post-Training, RL Environments
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Member of Technical Staff, Design Engineer
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Consumer Technology
In-Person
Member of Technical Staff, Infrastructure Engineer
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Enterprise Technology
In-Person

More Consumer Technology Jobs

GCD - Performance, CRM and Scale
$170K — $200K *
Media.Monks
Los Angeles, CA 90011 (Los Angeles County)
Today
Engineering Director, Firmware
$250K — $280K *
Afero
Los Altos, CA 94024 (Santa Clara County)
Today
Growth Merchant Lead - Emerging Markets (Nashville)
$70K — $95K *
DoorDash
Nashville, TN 37211 (Davidson County)
Today
Senior Software Development Engineer - Shopping Graph
$190K — $214K *
ID.me
Mountain View, CA 94040 (Santa Clara County)
Today
Senior Performance Advertising Engineer
$260K — $330K *
PubMatic
Redwood City, CA 94061 (San Mateo County)
Today

Find similar Member of Technical Staff, Inference jobs:

Nationwide San Francisco, CA

Member of Technical Staff, Inference

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Member of Technical Staff, Inference jobs:

Get Ready For Your
Next Interview