Software Engineer, Machine Learning Infrastructure - Generative AI

DoorDash

• $137K — $299K *

Sunnyvale, CA 94087In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

B.S., M.S., or PhD in Computer Science or equivalent
3+ years of software engineering experience
Strong backend engineering skills in Python and distributed systems
Experience with production services, APIs, and ML infrastructure at scale
Hands-on experience with LLM inference and fine-tuning of open-weight models
Ability to navigate fast-moving technical areas and translate use cases into platform capabilities
Proficiency with AI coding tools throughout the software development lifecycle

Responsibilities

Build infrastructure to transition GenAI prototypes to production
Develop real-time GPU endpoints and high-throughput batch inference systems
Design high-performance systems for model serving and fine-tuning
Enhance GPU inference efficiency and reduce operational costs
Create platforms for rapid experimentation while ensuring production standards
Collaborate with cross-functional teams to implement GenAI solutions
Shape the future of DoorDash's GenAI platform with innovative AI capabilities

Benefits

401(k) plan with employer matching
16 weeks of paid parental leave
Comprehensive wellness benefits
Paid time off and sick leave compliant with local laws
Medical, dental, and vision benefits
11 paid holidays
Disability and basic life insurance
Mental health program support

Full Job Description

You will join a small, high-leverage team building production infrastructure for Generative AI at DoorDash, with a primary focus on our open-weights model platform spanning inference and fine-tuning: real-time GPU serving, high-throughput batch inference, and model fine-tuning. You'll work across model serving and inference engines, fine-tuning and training pipelines, GPU autoscaling and utilization, batch pipelines, backend services, and observability. This role is ideal for an engineer who enjoys pushing the cost/performance frontier of GPU inference and fine-tuning in a fast-moving technical area where product needs, model capabilities, vendor ecosystems, and cost/performance tradeoffs are evolving quickly. You're excited about this opportunity because you will... • Build the infrastructure that helps DoorDash teams move GenAI ideas from prototype to production, increasing the velocity of business impact from AI across the company. • Work on our open-weights serving stack - real-time GPU endpoints, high-throughput batch inference, and fine-tuning (SFT/DPO/LoRA) - alongside the LLM Gateway, Agent Gateway, evals infrastructure, guardrails, and cost attribution. • Design scalable, high-performance systems for model serving, batch inference, GPU autoscaling, and fine-tuning that power real customer and internal automation use cases • Push the cost and latency frontier of GPU inference - turning batch jobs that took days into hours and cutting inference cost by multiples - while giving product teams a clean choice across open-weight and closed-source models with reliability, fallback, observability, and cost controls built in. • Build platforms that support rapid experimentation while meeting production standards for latency, scale, monitoring, SLOs, playbooks, and operational excellence. • Partner closely with ML engineers, product engineers, data scientists, and platform teams across DoorDash, Wolt, and Deliveroo to turn emerging GenAI capabilities into durable platform primitives. • Shape the future of DoorDash's centralized GenAI platform - including emerging directions such as reinforcement learning (RLHF/RLVR), agent optimization, and other post-training and agentic techniques - enabling the next generation of AI-powered products, agents, automation, and personalization. We're excited about you because... • B.S., M.S., or PhD. in Computer Science or equivalent • 3+ years of industry experience in software engineering • Strong backend engineering fundamentals, especially in Python and distributed systems. • Experience building production services, APIs, data pipelines, or ML infrastructure at scale. • Experience operating systems in production, including observability, debugging, reliability, incident response, and performance/cost optimization. • Hands-on experience with LLM inference and/or fine-tuning of open-weight models in production - serving (latency, throughput, batching, autoscaling, GPU utilization) and/or fine-tuning (SFT/DPO/LoRA). • Ability to work across ambiguous, fast-moving technical areas and turn customer use cases into reusable platform capabilities • Proficiency in using AI coding tools (e.g., Claude Code, Codex, Cursor) in the full software development lifecycle, including designing, generating code, testing, monitoring and releasing software Nice To Haves • Experience with LLM inference engines and serving frameworks (e.g., vLLM, SGLang, TensorRT-LLM) in production • Experience with distributed/multi-node fine-tuning and training pipelines (SFT, DPO/RLHF, LoRA), including data preparation and evaluation • GPU performance work - multi-node/distributed inference, KV-cache/memory optimization, quantization (FP8/INT8/AWQ/GPTQ), or cold-start/throughput tuning • Experience with Kubernetes, cloud infrastructure (AWS/GCP), GPUs, serverless/elastic GPU platforms (e.g., Modal), or high-throughput batch systems • Experience with LLM gateways, model routing, vendor abstraction, or cost attribution • Experience building developer platforms, internal platforms, or self-serve infrastructure • Experience building and deploying AI agents or MCP servers in production • Experience with eval systems, LLM observability, tracing, RAG, search, or vector databases Compensation The successful candidate's starting pay will fall within the pay range listed below and is determined based on job-related factors including, but not limited to, skills, experience, qualifications, work location, and market conditions. Base salary is localized according to an employee's work location. Ranges are market-dependent and may be modified in the future. In addition to base salary, the compensation for this role includes opportunities for equity grants. Talk to your recruiter for more information. DoorDash cares about you and your overall well-being. That's why we offer a comprehensive benefits package to all regular employees, which includes a 401(k) plan with employer matching, 16 weeks of paid parental leave, wellness benefits, commuter benefits match, paid time off and paid sick leave in compliance with applicable laws (e.g. Colorado Healthy Families and Workplaces Act). DoorDash also offers medical, dental, and vision benefits, 11 paid holidays, disability and basic life insurance, family-forming assistance, and a mental health program, among others. To learn more about our benefits, visit our careers page here. See below for paid time off details: • For salaried roles: flexible paid time off/vacation, plus 80 hours of paid sick time per year. • For hourly roles: vacation accrued at about 1 hour for every 25.97 hours worked (e.g. about 6.7 hours/month if working 40 hours/week; about 3.4 hours/month if working 20 hours/week), and paid sick time accrued at 1 hour for every 30 hours worked (e.g. about 5.8 hours/month if working 40 hours/week; about 2.9 hours/month if working 20 hours/week). The national base pay ranges for this position within the United States, including Illinois and Colorado. I4 $137,100-$201,600 USD I5 $167,800-$246,800 USD I6 $203,500-$299,300 USD

* Ladders Estimates

Similar Jobs

Software Development Analyst / Principal Software Development Analyst
$79K — $147K *
Northrop Grumman
Remote
Today
Software Development Engineer, Device Advertising
$120K — $160K *
Amazon
Sunnyvale, CA 94087 (Santa Clara County)
Today
Software Engineer (Level 3 / E3) - Core Platform
$100K — $140K *
Whatfix
San Jose, CA 95123 (Santa Clara County)
Today
Software Developer 3
$135K — $178K *
Oracle Corporation
Redwood City, CA 94061 (San Mateo County)
Today
Software Development Engineer I, ML Infra Services, Annapurna Labs
$127K — $185K *
Amazon
Cupertino, CA 95014 (Santa Clara County)
Today
Software Engineer (Level 4 / E4) - Core Platform
$120K — $160K *
Whatfix
San Jose, CA 95123 (Santa Clara County)
Today

Get Ready For Your
Next Interview

More Jobs at DoorDash

Software Engineer, Machine Learning Infrastructure - Generative AI
$137K — $299K *
San Francisco, CA 94112 (San Francisco County)
Reposted Today
Information Technology
In-Person
Software Engineer, Machine Learning Infrastructure - Generative AI
$137K — $299K *
Seattle, WA 98115 (King County)
Today
Information Technology
In-Person
Software Engineer, Machine Learning Infrastructure - Generative AI
$137K — $299K *
Sunnyvale, CA 94087 (Santa Clara County)
Today
Information Technology
In-Person
Senior Art Director, Superette
$108K — $160K *
Remote
Today
Media
Remote in San Francisco, CA
Senior Art Director, Superette
$108K — $160K *
San Francisco, CA 94112 (San Francisco County)
Today
Media
In-Person

More Information Technology Jobs

Principal AI Platform Engineer
$117K — $206K *
Datasite
Remote
Today
Senior Systems Analyst, EDI
$75K — $104K *
Reynolds Consumer Products
Lake Forest, IL 60045 (Lake County)
Today
Key Management Specialist
$80K — $110K *
Cymertek
Aurora, CO 80013 (Arapahoe County)
Today
System Administrator
$70K — $95K *
Cymertek
San Antonio, TX 78228 (Bexar County)
Today
Software Developer
$80K — $110K *
Cymertek
San Antonio, TX 78228 (Bexar County)
Today

Find similar Software Engineer, Machine Learning Infrastructure - Generative AI jobs:

Nationwide Sunnyvale, CA

Software Engineer, Machine Learning Infrastructure - Generative AI

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Software Engineer, Machine Learning Infrastructure - Generative AI jobs:

Get Ready For Your
Next Interview