Senior Observability & SRE Leader

Marsh McLennan • $150K — $180K *

Toronto, ON M3C 0E3In-Person

Information Technology

11 - 15 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

15+ years in technology with 8+ years in leadership roles in observability or SRE.
Proven record of transforming reactive monitoring into proactive SRE functions at enterprise scale.
Expertise in observability technologies including Prometheus, Jaeger, Splunk, and cloud platforms.
Experience in defining and implementing SLO/SLI/Error Budget frameworks.
Hands-on experience in building AIOps/ML-driven anomaly detection capabilities.
Strong background in chaos engineering and reliability design practices.
Ability to manage large observability budgets while optimizing costs.
Exceptional communication skills for engaging with executive stakeholders.

Responsibilities

Define and execute a transformative observability and SRE strategy.
Architect a unified observability platform across multiple infrastructures.
Consolidate and optimize existing tooling while reducing alert noise significantly.
Implement AIOps capabilities for anomaly detection and proactive issue prediction.
Design self-healing automation to resolve recurring incidents autonomously.
Adopt DevOps practices and infrastructure as code across teams.
Recruit and develop a high-performing team of observability and SRE professionals.

Benefits

Hybrid work flexibility with in-person collaboration opportunities.
Commitment to professional development and internal knowledge sharing.
Opportunity to establish an Observability & SRE Centre of Excellence.
Access to a large-scale environment with a Fortune 500 impact.

Full Job Description

Senior Observability & SRE Leader

Marsh is seeking a visionary, transformational leader to reimagine and rebuild our Observability and Site Reliability Engineering function from the ground up. This is not a role for someone who wants to maintain the status quo. We need a leader who will fundamentally shift this function to a predictive, data-driven engineering discipline that prevents outages before they happen, embeds reliability into every system from design through production, and treats observability data as a strategic asset - not just an operational tool.

This is a career-defining opportunity to build a world-class observability and SRE organization at Fortune 500 scale.

Job Responsibilities

STRATEGIC VISION & PLATFORM TRANSFORMATION

Define and execute an observability and SRE strategy that shifts the organization from reactive operations to predictive reliability engineering.
Architect and deliver a unified, full-stack observability platform covering metrics, traces, logs, real-user monitoring (RUM), synthetic monitoring, and business-level KPIs - across on-prem, multi-cloud (AWS/Azure), containers, and SaaS integrations.
Rationalize and consolidate the current fragmented tooling landscape into a cohesive, cost-optimized platform. Eliminate redundant tools, reduce alert noise by 80%+, and establish a single pane of glass for system health.
Drive adoption of OpenTelemetry as the standard instrumentation framework, ensuring vendor-agnostic telemetry collection and future portability.

PREDICTIVE & PROACTIVE RELIABILITY

Build and operationalize AIOps and ML-driven capabilities to detect anomalies, predict failures, and surface emerging risks before they impact customers. Move beyond threshold-based alerting to intelligent, context-aware detection.
Establish automated correlation engines that link infrastructure signals, application traces, deployment events, and change records to dramatically reduce diagnostic time and identify root cause automatically.
Design and implement self-healing automation that detects, diagnoses, and remediates common failure patterns without human intervention - targeting 40%+ of recurring incidents for autonomous resolution.
Introduce chaos engineering and reliability testing programs (GameDays, fault injection, load testing) to proactively discover weaknesses before production incidents reveal them.

SITE RELIABILITY ENGINEERING CULTURE

Transform the existing operations-centric team into a modern SRE organization with embedded reliability engineers across product and platform squads, operating under a "you build it, you own it" model.
Define and implement SLO/SLI/Error Budget frameworks across critical services, creating a shared language between engineering, product, and business stakeholders for reliability decisions.
Drive the adoption of DevOps practices, CI/CD pipelines, and infrastructure as code using tools like Terraform or CloudFormation to manage infrastructure.
Champion reliability-first design principles - ensuring observability, graceful degradation, circuit breaking, and failure isolation are architected into every system from day one, not bolted on after launch.

INCIDENT PREVENTION & RAPID RECOVERY

Partner with Major Incident Management and Problem Management to build closed-loop feedback systems - every incident produces a reliability improvement, not just a postmortem document.
Drive MTTR toward minutes (not hours) through automated diagnostics, pre-built remediation playbooks, and intelligent correlation that tells responders what is wrong, not just that something is wrong.
Establish "Incidents Prevented" as a primary success metric alongside traditional MTTR/MTTD measures.

BUSINESS-ALIGNED OBSERVABILITY

Elevate observability from infrastructure metrics to business outcomes. Build real-time dashboards that connect system health to revenue impact, customer experience scores, and SLA compliance.
Integrate observability insights into ITSM (ServiceNow), data platforms, and executive reporting - making reliability data a first-class input to business and technology decision-making.

ENGINEERING & OPERATIONAL EXCELLENCE

Own the total cost of ownership of the observability platform. Optimize spend through data tiering, intelligent sampling, retention policies, and vendor negotiations. Deliver more insight per dollar.
Manage strategic vendor relationships (Datadog, Splunk, Logic Monitor, cloud-native tooling) with a focus on maximizing value extraction, not just license management.
Build a platform engineering mindset: observability capabilities are delivered as self-service products to engineering teams - instrumentation libraries, dashboard templates, alerting-as-code, SLO toolkits.

TEAM BUILDING & LEADERSHIP

Recruit, develop, and retain a world-class team of SRE engineers, observability platform engineers, data and performance engineers, and reliability analysts.
Establish an Observability & SRE Centre of Excellence that drives standards, best practices, and enablement across the global enterprise.
Foster a learning culture through internal tech talks, blameless postmortems, chaos engineering programs, and industry engagement.

REQUIRED EXPERIENCE & EXPERTISE

15+ years in technology with 8+ years in progressively senior observability, SRE, or platform reliability leadership roles.
Demonstrated track record of transforming reactive monitoring organizations into proactive, engineering-driven SRE functions at enterprise scale (10,000+ employees, 1,000+ applications).
Deep expertise across the full observability stack: metrics (Prometheus, Datadog, CloudWatch), distributed tracing (Jaeger, OpenTelemetry, Datadog APM), log aggregation (Splunk, ELK, Datadog Logs), synthetic monitoring, and RUM.
Hands-on experience defining and operationalizing SLO/SLI/Error Budget frameworks that drive engineering prioritization and business alignment.
Proven experience building AIOps / ML-driven anomaly detection and automated remediation capabilities - not just evaluating vendor demos, but delivering production systems that prevent real incidents.
Strong background in chaos engineering, resilience testing, and reliability-by-design practices (circuit breakers, bulkheads, graceful degradation, retry/backoff patterns).
Experience operating across hybrid infrastructure: on-premises data centers, AWS, Azure, containerized workloads (Kubernetes), and SaaS platforms.
Demonstrated ability to drive cultural and organizational transformation across large, complex enterprises with multiple business units and hundreds of engineering squads.
Experience managing $5M+ observability platform budgets and optimizing total cost of ownership while expanding coverage and capability.
Executive communication skills - ability to present reliability strategy, risk posture, and investment cases to C-suite and board-level audiences.
Visionary thinker who can articulate a compelling future state and build the roadmap to get there - then execute relentlessly.

Marsh is committed to hybrid work, which includes the flexibility of working remotely and the collaboration, connections and professional development benefits of working together in the office. All Marsh colleagues are expected to be in their local office or working onsite with clients at least three days per week. Office-based teams will identify at least one "anchor day" per week on which their full team will be together in person.

This is a New position.

About Marsh McLennan

Marsh McLennan Careers

Join the exceptional team at Marsh McLennan, a global leader in professional services, offering unparalleled job opportunities in insurance, risk management, and consultancy. As the company propels forward, it invites dedicated professionals to contribute to a culture of innovation, leadership, and growth.

Work You’ll Do

At Marsh McLennan, you will engage with complex challenges that push the boundaries of your skills and knowledge. Our team thrives on diversity and the shared goal of delivering impactful solutions to our clients worldwide. By joining us, you will be part of a culture that values diversity training and leadership development, ensuring every team member is equipped for success.

Explore Professional Growth

Marsh McLennan is committed to the professional growth of its employees. Whether you are seeking a position that offers a path to leadership or looking for robust internship programs to kickstart your career, Marsh McLennan provides the resources and global platform to propel your ambitions into achievements. Our benefits package is designed to support the well-being and continuous professional development of all staff, from entry-level to senior leadership roles.

Innovative Work Environment

Our company is at the forefront of industry innovation. The collaboration between experienced professionals and fresh talent generates dynamic solutions that keep Marsh McLennan at the cutting edge of the industry. Our team is encouraged to lead with creativity and embrace new ideas, driving the company’s legacy of pioneering industry-first solutions.

Join Our Team

Marsh McLennan is hiring! Explore the multitude of job opportunities on our careers page, from strategic advisory roles to operational excellence positions. We look for passionate, curious, and solution-driven team players. Enhance your career with Marsh McLennan, where your skills will be honed through challenging projects and high-impact strategies.

Networking and Career Advancement

Networking at Marsh McLennan opens doors to enriching connections and countless opportunities within the industry. Our professionals benefit from an environment that fosters networking through events, professional groups, and collaborative projects. With Marsh McLennan, career advancement is not just a possibility—it is an expectation.

Prepare for Your Interview

Ready to apply? Make sure your resume highlights your most relevant experiences and skills tailored to the position you are applying for. Our interview process is designed to understand your capabilities and fit with our team’s goals and values. Prepare to discuss how your background, experiences, and professional aspirations align with the opportunities at Marsh McLennan.

Stay Connected

Keep up to date with the latest from Marsh McLennan: - **Career Insights**: Gain insider perspectives and industry-leading insights through our careers blog. - **Job Alert Emails**: Personalize your subscription to receive job alerts and the latest news tailored to your preferences. Discover the rewarding career opportunities awaiting at Marsh McLennan, where your professional journey is just the beginning. Join us in shaping a future defined by insight, integrity, and innovation.

Learn more about Marsh McLennan

Size

83,000 employees

Market Cap

$82.6 billion

Industry

Finance & Insurance

Net Income

$2 billion

Founded

1914

5 Year Trend

+8.5%

Revenue

$17.2 billion

NASDAQ

MMC

* Ladders Estimates

Similar Jobs

Head, Global IAM Directory Services
$130K — $180K *
Scotiabank
Toronto, ON M3C 0E3
Today
Manager, Platform & AI Architecture - REMOTE
$117K — $152K *
PSCU Financial Services
Remote
3 days ago
Staff Technology Manager - Compute Platform
$130K — $180K *
General Motors
Milford, MI 48381 (Oakland County)
Reposted 5 days ago
Staff Technology Manager - Compute Platform
$130K — $180K *
General Motors
Warren, MI 48089 (Macomb County)
Reposted 5 days ago
Senior Manager, IT Support - AML/ Oracle Transaction Monitoring
$115K — $163K *
TD Bank
Toronto, ON M3C 0E3
Reposted 6 days ago
Senior EDI Manager, IT
$150K — $180K *
SYNNEX Corp
Mississauga, ON L4T 0A1
1 week ago

Get Ready For Your
Next Interview

More Jobs at Marsh McLennan

NERA Analyst/Sr. Analyst - Labor and Employment (Immediate Hire)
$115K — $125K *
New York, NY 10025 (New York County)
Reposted Today
Business Services
In-Person
NERA Analyst/Sr. Analyst - Labor and Employment (Immediate Hire)
$115K — $125K *
Los Angeles, CA 90011 (Los Angeles County)
Reposted Today
Business Services
In-Person
NERA Analyst/Sr. Analyst - Labor and Employment (Immediate Hire)
$115K — $125K *
Chicago, IL 60629 (Cook County)
Reposted Today
Business Services
In-Person
Actuarial Engineering Manager/Senior Manager
$125K — $200K *
New York, NY 10025 (New York County)
Reposted Today
Finance & Insurance
Hybrid
Actuarial Engineering Manager/Senior Manager
$125K — $200K *
Chicago, IL 60629 (Cook County)
Reposted Today
Finance & Insurance
Hybrid

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
1 week ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Senior Software Engineer (AI Orchestrator)
$105K — $145K *
Cotiviti
Remote
Today
Sr/Staff Lustre Engineer
$150K — $250K *
Data Direct Networks
Raleigh, NC 27610 (Wake County)
Today

Find similar Senior Observability & SRE Leader jobs:

Nationwide Toronto, ON