Manager Site Reliability Operations

Mercury Insurance • $118K — $230K *

US-AnywhereRemote in Brea, CA

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor’s degree in computer science, Information Systems, Engineering, or related field, or equivalent experience.
7+ years in IT operations, Site Reliability Engineering, or DevOps roles in a 24x7 environment.
3+ years in a leadership or management role overseeing technical teams.
Familiarity with CI/CD pipelines and cloud/container platforms (e.g., Kubernetes, AWS) is preferred.
Strong understanding of observability practices and incident management.

Responsibilities

Lead the Site Reliability Operations team and set priorities and success metrics.
Partner with cross-functional teams to embed CI/CD best practices into operations.
Oversee service reliability monitoring and incident management processes.
Drive root cause analysis for high-severity incidents and standardize post-incident reviews.
Define and report operational and reliability metrics to Technology Operations leadership.
Champion automation and reduce manual effort through 'operations as code'.
Develop and mentor team members while fostering a collaborative culture.

Benefits

Competitive compensation
Flexibility to work from anywhere in the United States for most positions
Paid time off, sick time, and volunteer hours
Incentive bonus programs
Comprehensive medical, dental, vision, and life insurance
401(k) retirement savings plan with company match
Education assistance and career development opportunities
Health and wellbeing resources, including mental wellbeing support

Full Job Description

Overview

Position Summary:

The Site Reliability Operations (SRO) Manager leads the team responsible for end-to-end observability, real-time monitoring, and operational response across Mercury’s production and non-production platforms. This role centers on proactive detection of issues, live support during releases, and structured incident and problem management to minimize customer impact and drive long-term stability.

The SRO Manager ensures that services are well-instrumented (metrics, logs, traces, and dashboards), that alerts are actionable and tuned, and that root cause analysis (RCA) and follow-through on corrective actions are consistently executed. The SRO Manager partners closely with application development, DevOps COE, Site Reliability Engineering (SRE), and Infrastructure teams to build release and runtime practices that are observable by design, provide real-time operational support during deployments, and use data-driven insights and automation to continuously improve system resilience, change success rates, and time to recovery.

Geo-Salary Information

An in-person interview may be required during the hiring process

State specific pay scales for this role are as follows:

$118,664 to $230,619 (NJ, NY, WA, HI, AK, MD, CT, RI, MA)

$107,876 to $209,653 (NV, OR, AZ, CO, WY, TX, ND, MN, MO, IL, WI, FL, GA, MI, OH, VA, PA, DE, VT, NH, ME)

$97,089 to $188,688 (UT, ID, MT, NM, SD, NE, KS, OK, IA, AR, LA, MS, AL, TN, KY, IN, SC, NC, WV)

In CA: Typical hiring range is $157,177.00 to $218,302.00

The expected base salary for this position will vary depending on a number of factors, including relevant experience, skills and location.

Responsibilities

Essential Job Functions:

Lead the Site Reliability Operations team, including the Network Operations Center (NOC), responsible for observability, real-time monitoring, incident response, and operational excellence for key enterprise services; set direction, priorities, and success metrics for the team.
Partner with Product Management, Engineering, SRE, and the rest of infrastructure team to embed CI/CD and release best practices into operations, including automated build/test/deploy, health checks, rollbacks, release monitoring via the NOC, and change-management guardrails.
Oversee service reliability monitoring and incident management: ensure appropriate observability (metrics, logs, traces, dashboards), well-tuned alerting thresholds, escalation paths, and effective communications to stakeholders and leadership during incidents.
Own and mature the Problem Management function for the team: drive root cause analysis (RCA) of recurring or high-severity incidents, standardize post-incident reviews, and ensure corrective actions and follow-ups are implemented and verified.
Define, track, and report operational and reliability metrics (e.g., availability, MTTR, incident volume, change failure rate, deployment frequency, problem resolution time); provide regular insights and recommendations to Technology Operations leadership.
Champion automation and “operations as code” (infrastructure as code, configuration as code, automated runbooks), working with engineering teams to reduce manual toil and improve consistency, speed, and safety of operations and releases.
Recruit, develop, coach, and evaluate team members; provide performance feedback, make salary and promotion recommendations, and foster a high-performing, collaborative culture aligned with Mercury’s core values.
Provide leadership coverage for 7x24 mission-critical support through the NOC and on-call rotations; ensure sustainable on-call practices, high-quality runbooks, and continuous improvement of tooling and processes.

Qualifications

Education:

Minimum:

Bachelor’s degree in computer science, Information Systems, Engineering, or related field, or equivalent combination of education and work experience.

Preferred: Advanced coursework or certifications or experience in Site Reliability Engineering, DevOps, Cloud platforms, or ITIL).

Experience:

Minimum:

7+ years of experience in IT operations, SRE, DevOps, or related roles supporting mission-critical systems.
3+ years of experience in a lead or management role overseeing technical teams in a 24x7 environment.

Preferred:

Experience leading teams that support services deployed via modern CI/CD pipelines and running on cloud and/or container platforms (e.g., Kubernetes/OpenShift, AWS). Experience integrating operations functions with DevOps/SRE teams, including shared ownership of reliability goals and metrics.

Knowledge and Skills:

Strong understanding of CI/CD pipelines (build, test, security scanning, deployment, rollback) and how they support reliable operations.
Solid knowledge of observability practices and tools (metrics, logs, traces, dashboards, alerts) and how to design actionable monitoring and alerting for production systems.
Deep familiarity with incident and problem management processes, including root cause analysis methods and post-incident review facilitation.
Working knowledge of DevOps/SRE concepts such as SLOs/SLIs, error budgets, resilience patterns, automation to reduce toil, and blameless culture.
Demonstrated ability to lead and influence cross-functional teams, build relationships, and collaborate effectively with engineering, InfoSec, infrastructure, and business stakeholders.
Excellent communication skills, both written and verbal; able to clearly communicate technical issues, risks, and recommendations to technical and non-technical audiences, including senior leadership.
Strong analytical and problem-solving skills; able to analyze operational data and trends to identify risks, drive decisions, and prioritize improvements.
Self-motivated, adaptable, and able to operate with minimal supervision in a fast-changing environment.
Ability to work extended hours, nights, or weekends as needed to support critical releases or resolve major incidents.

Perks and Benefits

We offer many great benefits, including:

Competitive compensation
Flexibility to work from anywhere in the United States for most positions
Paid time off (vacation time, sick time, 9 paid Company holidays, volunteer hours)
Incentive bonus programs (potential for holiday bonus, referral bonus, and performance-based bonus)
Medical, dental, vision, life, and pet insurance
401 (k) retirement savings plan with company match
Engaging work environment
Promotional opportunities
Education assistance
Professional and personal development opportunities
Company recognition program
Health and wellbeing resources, including free mental wellbeing therapy/coaching sessions, child and eldercare resources, and more

Pay RangeUSD $118,664.00 - USD $230,619.00 /Yr.

About Mercury Insurance

Mercury Insurance Group is a multiple-line insurance organization offering personal automobile, homeowners, renters and business insurance. Founded in 1961 and headquartered in Los Angeles, Mercury has assets in excess of $4 billion, employs 4,500 people and has more than 8,000 independent agents in 11 states. Mercury has been named one of America's Most Trustworthy Companies by Forbes magazine, and has been recognized as one of the Best Places to Work in Los Angeles for eight years running. The company has also been named one of America's Best Midsize Employers by Forbes.

Learn more about Mercury Insurance

Size

4,300 employees

Market Cap

$1.8 billion

Industry

Finance & Insurance

Net Income

$374.6 million

Founded

1962

5 Year Trend

+4.3%

Revenue

$3.7 billion

NASDAQ

MCY

* Ladders Estimates

Similar Jobs

Manager, Site Reliability Operations- Walmart Energy
$80K — $155K *
Walmart, Inc.
Camden, NJ 08100
Yesterday
Manager, Site Reliability Operations- Walmart Energy
$80K — $155K *
Walmart, Inc.
Bentonville, AR 72712 (Benton County)
Yesterday
Platform Operations Manager
$154K — $278K *
Leidos
Bethesda, MD 20817 (Montgomery County)
Yesterday
Site Reliability Engineer (Infrastructure)
$90K — $130K *
FIS
Jacksonville, FL 32210 (Duval County)
2 days ago
Systems Operations Manager – Data Platforms -Teradata & Hadoop
$120K — $150K *
Wells Fargo
Irving, TX 75061 (Dallas County)
2 days ago
Systems Operations Manager – Data Platforms -Teradata & Hadoop
$100K — $130K *
Wells Fargo
Charlotte, NC 28269 (Mecklenburg County)
2 days ago

Get Ready For Your
Next Interview

More Jobs at Mercury Insurance

Manager Site Reliability Operations
$118K — $230K *
Remote
Today
Information Technology
Remote in Brea, CA
Sr Commercial Sales Representative: Dallas - Fort Worth
$94K — $179K *
Remote
3 days ago
Finance & Insurance
Remote in Dallas, TX
Head of Sales Enablement
$125K — $351K *
Remote
3 days ago
Business Services
Remote in United States
Senior Software Engineer Test
$94K — $179K *
Brea, CA 92821 (Orange County)
4 days ago
Information Technology
Hybrid
Test Engineer II
$76K — $142K *
Brea, CA 92821 (Orange County)
4 days ago
Information Technology
Hybrid

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
2 weeks ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
2 weeks ago
ServiceNow Manager
$90K — $120K *
Grow Financial Federal Credit Union
Remote
Today
Staff Software Engineer
$120K — $150K *
H-E-B
San Antonio, TX 78228 (Bexar County)
Today

Find similar Manager Site Reliability Operations jobs:

Nationwide Remote

Manager Site Reliability Operations

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Manager Site Reliability Operations jobs:

Get Ready For Your
Next Interview