Role Overview:
We are looking for a Senior Manager of Site Reliability Engineering to lead one of the most consequential infrastructure organizations at Netflix. This role owns two intersecting mandates: setting the reliability standards that the entire engineering organization builds to, and leading the SRE team supporting our streaming architecture.
Netflix’s infrastructure is undergoing a fundamental shift. Infrastructure is quickly evolving towards a millions-of-agents ecosystem, with AI agents increasingly embedded in how we detect, diagnose, and remediate incidents; how we plan capacity; and how we evolve our reliability posture over time.
Core Responsibilities
Strategic Leadership: Build and scale a world-class SRE function, defining the operating model for how SREs partner with product and infrastructure teams.
Reliability Governance: Establish and socialize company-wide standards (SLIs/SLOs, Error Budgets) and publish transparent reliability scorecards to drive engineering accountability.
Resilience Operations: Standardize chaos engineering, fault injection, and proactive risk modeling (dependency mapping, traffic simulation) across the Netflix stack.
Cross-Functional Partnership: Collaborate with CDN, Playback, and Ads teams to eliminate systemic failures and translate technical reliability data into actionable business risk for executives.
Qualifications
Experience: 12+ years in software/infrastructure, with 5+ years in senior SRE leadership.
Technical Mastery: Deep fluency in cloud-native scale (AWS/GCP, Containers, Service Mesh) and modern observability (Metrics, Tracing, Logging).
AI/ML Fluency: Practical experience building or implementing AIOps, anomaly detection, or agentic infrastructure systems.
Organizational Influence: Proven ability to drive technical adoption across complex, decentralized organizations through influence rather than mandate.
Communication: Ability to navigate the "human-machine" boundary of automation and clearly articulate technical trade-offs to non-technical stakeholders.
Preferred (Nice to Have)
Experience in streaming media, ad-tech, or high-scale gaming backends.
Hands-on design of LLM-based autonomous agents in production.
Familiarity with Netflix’s OSS ecosystem (Spinnaker, Atlas, Mantis) or Chaos Monkey.
Generally, our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $695,000.00 - $1,600,000.00. This compensation range will vary based on location.
Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here.
Netflix is a unique culture and environment. Learn more here.
Job is open for no less than 7 days and will be removed when the position is filled.