The Opportunity
The Agent Platform team gives Netflix engineers the infrastructure to go from zero to a production-grade AI agent without reinventing the wheel. We own the foundational building blocks the whole company builds on: the Model Gateway (unified access to external LLMs like Claude, GPT, and Gemini), the Assistance API for conversational use cases, the MCP Gateway that connects agents to Netflix's internal systems and knowledge, our Agent SDK (built on Strands and Claude), and an end-to-end evaluation stack via Braintrust.
The work has moved well beyond chat completions. Teams across Netflix are now building agents — systems that plan, call tools, observe results, and iterate — and they depend on us for the infrastructure to do that reliably and to know whether their agents are actually any good. We're a small team with outsized leverage: what we ship becomes the foundation for AI across all of Netflix.
What you will do:
Design, build, and operate the Agent SDK and MCP Gateway that Netflix engineers use to build, deploy, and run AI agents in production.
Build agents and agent infrastructure across the full lifecycle — plan/act/observe loops, tool and MCP integrations, deployment, and day-2 operations.
Make evaluation a first-class part of the platform: build the tracing, eval suites, and quality signals that let teams measure agents, catch regressions, and iterate to make them better.
Own reliability, observability, and guardrails for non-deterministic systems running at very high scale
Lead cross-functional initiatives with ML scientists, data scientists, product managers, and other AI Platform teams.
Rapidly iterate with users to improve the developer experience while establishing durable foundational capabilities.
Desired Background:
8+ years of software engineering experience with a track record of delivering quality results.
Hands-on experience building, deploying, operating, AND evaluating LLM agents in production
not just chat-completion apps or prototypes.
Experience with one or more agent frameworks/SDKs (Strands, OpenAI Agents SDK, Anthropic Claude Agent SDK, LangGraph, pydantic-ai, CrewAI, Google ADK) and with tool/function calling and MCP.
Experience with LLM/agent evaluation and observability
building eval suites, tracing, and quality measurement, then iterating on results (Braintrust, LangSmith, W&B, or equivalent).
Strong experience building SDKs and APIs for internal or external developers.
Strong fundamentals in building and operating scalable, observable, fault-tolerant distributed systems.
Proficiency in Python (and Python packaging tooling) plus one of Java, Go, C/C++, Rust, or Zig. Familiarity with our stack
Temporal, FastAPI, PostgreSQL, Kubernetes
is a plus.
Experience with large-scale build, release, CI/CD, and observability methods.
Generally, our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $466,000.00 - $750,000.00. This compensation range will vary based on location.
Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here.