Netflix

Senior Manager, Critical Operations & Reliability Engineering

Netflix$500K+*
US-AnywhereRemote in United States
Consumer Technology
11 - 15 years of experience
Job Overview by Ladders

Qualifications

  • 12+ years in software/infrastructure, with 5+ years in senior SRE leadership.
  • Deep fluency in cloud-native environments (AWS/GCP, Containers, Service Mesh).
  • Experience with AI/ML systems, especially in AIOps and anomaly detection.
  • Proven ability to influence technical adoption in decentralized organizations.
  • Strong communication skills to articulate technical concepts to non-technical stakeholders.

Responsibilities

  • Build and scale a world-class Site Reliability Engineering function.
  • Establish and promote company-wide reliability standards and scorecards.
  • Standardize chaos engineering and proactive risk modeling across Netflix's infrastructure.
  • Collaborate with cross-functional teams to address systemic failures and communicate risks to executives.

Benefits

  • Comprehensive health plans and mental health support.
  • 401(k) retirement plan with employer match.
  • Stock option program for equity compensation.
  • Disability programs and health savings accounts.
  • Family-forming benefits, life, and serious injury benefits.
  • Generous paid leave policies, including 35 days of paid time off for full-time hourly employees.
Full Job Description

Role Overview:

We are looking for a Senior Manager of Site Reliability Engineering to lead one of the most consequential infrastructure organizations at Netflix. This role owns two intersecting mandates: setting the reliability standards that the entire engineering organization builds to, and leading the SRE team supporting our streaming architecture.

Netflix’s infrastructure is undergoing a fundamental shift. Infrastructure is quickly evolving towards a millions-of-agents ecosystem, with AI agents increasingly embedded in how we detect, diagnose, and remediate incidents; how we plan capacity; and how we evolve our reliability posture over time.

Core Responsibilities

  • Strategic Leadership: Build and scale a world-class SRE function, defining the operating model for how SREs partner with product and infrastructure teams.

  • Reliability Governance: Establish and socialize company-wide standards (SLIs/SLOs, Error Budgets) and publish transparent reliability scorecards to drive engineering accountability.

  • Resilience Operations: Standardize chaos engineering, fault injection, and proactive risk modeling (dependency mapping, traffic simulation) across the Netflix stack.

  • Cross-Functional Partnership: Collaborate with CDN, Playback, and Ads teams to eliminate systemic failures and translate technical reliability data into actionable business risk for executives.

Qualifications

  • Experience: 12+ years in software/infrastructure, with 5+ years in senior SRE leadership.

  • Technical Mastery: Deep fluency in cloud-native scale (AWS/GCP, Containers, Service Mesh) and modern observability (Metrics, Tracing, Logging).

  • AI/ML Fluency: Practical experience building or implementing AIOps, anomaly detection, or agentic infrastructure systems.

  • Organizational Influence: Proven ability to drive technical adoption across complex, decentralized organizations through influence rather than mandate.

  • Communication: Ability to navigate the "human-machine" boundary of automation and clearly articulate technical trade-offs to non-technical stakeholders.


Preferred (Nice to Have)

  • Experience in streaming media, ad-tech, or high-scale gaming backends.

  • Hands-on design of LLM-based autonomous agents in production.

  • Familiarity with Netflix’s OSS ecosystem (Spinnaker, Atlas, Mantis) or Chaos Monkey.

Generally, our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $695,000.00 - $1,600,000.00. This compensation range will vary based on location.

Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here.

Netflix is a unique culture and environment. Learn more here.

Job is open for no less than 7 days and will be removed when the position is filled.

About Netflix

Netflix, Inc. is an American media company founded on August 29, 1997 by Reed Hastings and Marc Randolph in Scotts Valley, California, and currently based in Los Gatos, California, with production offices and stages at the Los Angeles-based Hollywood studios (formerly old Warner Brothers studios) and the Albuquerque Studios (formerly ABQ studios). It operates an eponymous over-the-top subscription video on-demand service, which showcases acquired and original programming as well as third-party content licensed from other production companies and distributors. Netflix is also the first streaming media company to be a member of the Motion Picture Association.
Learn more about Netflix
Size
11,300 employees
Market Cap
$127.6 billion
Industry
Net Income
$2.7 billion
Founded
1997
5 Year Trend
+27.5%
Revenue
$24.9 billion
NASDAQ

More Jobs at Netflix

More Consumer Technology Jobs

Find similar Senior Manager, Critical Operations & Reliability Engineering jobs: