Full Job Description
You will join a small, high-leverage team building production infrastructure for Generative AI at DoorDash, with a primary focus on our open-weights model platform spanning inference and fine-tuning: real-time GPU serving, high-throughput batch inference, and model fine-tuning. You'll work across model serving and inference engines, fine-tuning and training pipelines, GPU autoscaling and utilization, batch pipelines, backend services, and observability. This role is ideal for an engineer who enjoys pushing the cost/performance frontier of GPU inference and fine-tuning in a fast-moving technical area where product needs, model capabilities, vendor ecosystems, and cost/performance tradeoffs are evolving quickly.
You're excited about this opportunity because you will...
• Build the infrastructure that helps DoorDash teams move GenAI ideas from prototype to production, increasing the velocity of business impact from AI across the company.
• Work on our open-weights serving stack - real-time GPU endpoints, high-throughput batch inference, and fine-tuning (SFT/DPO/LoRA) - alongside the LLM Gateway, Agent Gateway, evals infrastructure, guardrails, and cost attribution.
• Design scalable, high-performance systems for model serving, batch inference, GPU autoscaling, and fine-tuning that power real customer and internal automation use cases
• Push the cost and latency frontier of GPU inference - turning batch jobs that took days into hours and cutting inference cost by multiples - while giving product teams a clean choice across open-weight and closed-source models with reliability, fallback, observability, and cost controls built in.
• Build platforms that support rapid experimentation while meeting production standards for latency, scale, monitoring, SLOs, playbooks, and operational excellence.
• Partner closely with ML engineers, product engineers, data scientists, and platform teams across DoorDash, Wolt, and Deliveroo to turn emerging GenAI capabilities into durable platform primitives.
• Shape the future of DoorDash's centralized GenAI platform - including emerging directions such as reinforcement learning (RLHF/RLVR), agent optimization, and other post-training and agentic techniques - enabling the next generation of AI-powered products, agents, automation, and personalization.
We're excited about you because...
• B.S., M.S., or PhD. in Computer Science or equivalent
• 3+ years of industry experience in software engineering
• Strong backend engineering fundamentals, especially in Python and distributed systems.
• Experience building production services, APIs, data pipelines, or ML infrastructure at scale.
• Experience operating systems in production, including observability, debugging, reliability, incident response, and performance/cost optimization.
• Hands-on experience with LLM inference and/or fine-tuning of open-weight models in production - serving (latency, throughput, batching, autoscaling, GPU utilization) and/or fine-tuning (SFT/DPO/LoRA).
• Ability to work across ambiguous, fast-moving technical areas and turn customer use cases into reusable platform capabilities
• Proficiency in using AI coding tools (e.g., Claude Code, Codex, Cursor) in the full software development lifecycle, including designing, generating code, testing, monitoring and releasing software
Nice To Haves
• Experience with LLM inference engines and serving frameworks (e.g., vLLM, SGLang, TensorRT-LLM) in production
• Experience with distributed/multi-node fine-tuning and training pipelines (SFT, DPO/RLHF, LoRA), including data preparation and evaluation
• GPU performance work - multi-node/distributed inference, KV-cache/memory optimization, quantization (FP8/INT8/AWQ/GPTQ), or cold-start/throughput tuning
• Experience with Kubernetes, cloud infrastructure (AWS/GCP), GPUs, serverless/elastic GPU platforms (e.g., Modal), or high-throughput batch systems
• Experience with LLM gateways, model routing, vendor abstraction, or cost attribution
• Experience building developer platforms, internal platforms, or self-serve infrastructure
• Experience building and deploying AI agents or MCP servers in production
• Experience with eval systems, LLM observability, tracing, RAG, search, or vector databases
Compensation
The successful candidate's starting pay will fall within the pay range listed below and is determined based on job-related factors including, but not limited to, skills, experience, qualifications, work location, and market conditions. Base salary is localized according to an employee's work location. Ranges are market-dependent and may be modified in the future.
In addition to base salary, the compensation for this role includes opportunities for equity grants. Talk to your recruiter for more information.
DoorDash cares about you and your overall well-being. That's why we offer a comprehensive benefits package to all regular employees, which includes a 401(k) plan with employer matching, 16 weeks of paid parental leave, wellness benefits, commuter benefits match, paid time off and paid sick leave in compliance with applicable laws (e.g. Colorado Healthy Families and Workplaces Act). DoorDash also offers medical, dental, and vision benefits, 11 paid holidays, disability and basic life insurance, family-forming assistance, and a mental health program, among others.
To learn more about our benefits, visit our careers page here.
See below for paid time off details:
• For salaried roles: flexible paid time off/vacation, plus 80 hours of paid sick time per year.
• For hourly roles: vacation accrued at about 1 hour for every 25.97 hours worked (e.g. about 6.7 hours/month if working 40 hours/week; about 3.4 hours/month if working 20 hours/week), and paid sick time accrued at 1 hour for every 30 hours worked (e.g. about 5.8 hours/month if working 40 hours/week; about 2.9 hours/month if working 20 hours/week).
The national base pay ranges for this position within the United States, including Illinois and Colorado.
I4
$137,100-$201,600 USD
I5
$167,800-$246,800 USD
I6
$203,500-$299,300 USD