Member of Technical Staff, Inference

Mirendil

$350K — $500K *
Consumer Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Extensive experience with inference systems and performance optimization.
  • Strong understanding of GPU and accelerator hardware architecture.
  • Knowledge of distributed inference frameworks (e.g., vLLM, TensorRT-LLM).
  • Familiarity with optimization techniques like speculative decoding and quantization.
  • Proven ability in building observability infrastructure for complex systems.

Responsibilities

  • Design and build high-throughput inference serving systems for large models.
  • Optimize performance across GPU and accelerator hardware.
  • Extend and enable distributed inference frameworks for varied workloads.
  • Implement inference-time optimizations for enhanced model efficiency.
  • Establish reliability infrastructure to measure key performance metrics.
  • Collaborate with teams to integrate new model architectures into production.

Benefits

  • Meaningful equity grant based on experience and background.
  • Competitive benefits package.
Full Job Description
The Role

We are looking for an engineer to own the inference systems that power our models in production and research. You'll work across the full inference stack, from serving infrastructure down to hardware-level optimization. Some example areas you might work on (not limited to):

  • Design and build high-throughput, low-latency inference serving systems for frontier models, optimizing for both research iteration and production deployment
  • Optimize inference performance across GPU and accelerator hardware - maximizing FLOPs utilization, memory bandwidth, and compute efficiency for large-scale models
  • Enable and extend distributed inference frameworks (e.g. vLLM, SGLang, TensorRT-LLM) to support novel architectures, long-context workloads, and agentic inference patterns
  • Implement and validate inference-time optimizations: speculative decoding, quantization, KV cache management, and batching strategies
  • Build observability and reliability infrastructure so the team can measure latency, throughput, and cost across every serving configuration
  • Partner directly with teams to bring new model architectures and post-training techniques into production quickly


If you're excited about pushing the performance limits of frontier model inference, we'd love to hear from you.

We offer a base salary of $350,000-$500,000 USD and a meaningful equity grant, depending on experience and background, along with competitive benefits.

Similar Jobs

More Jobs at Mirendil

More Consumer Technology Jobs

Find similar Member of Technical Staff, Inference jobs: