Senior Observability Engineer

FanDuel

• $149K — $186K *

Atlanta, GA 30349In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years of hands-on experience in observability engineering, SRE, or platform engineering.
Extensive knowledge of monitoring and observability tools, especially Datadog.
Experience with cross-team observability initiatives and reliability practices.
Proficient in cloud infrastructure management, particularly AWS and Kubernetes.
Strong coding skills in at least one modern programming language like Python or Go.
Well-versed in defining SLOs, SLIs, and alerting strategies based on user impact.
Good understanding of distributed systems principles and their practical applications.

Responsibilities

Design and implement scalable observability solutions that yield actionable insights.
Develop and promote best practices for alerting, incident management, and monitoring.
Enhance incident response processes and improve on-call practices.
Collaborate with teams to identify risks and enhance overall system reliability.
Automate workflows to improve root cause analysis and reduce operational toil.
Work closely with product and engineering teams to present observability insights for decision-making.
Mentor team members to raise standards of observability and reliability.

Benefits

Broad selection of health plans, including mental health support and fitness programs.
Generous paid time off (PTO) and sick leave.
401k with up to 5% match and commuter benefits.
Annual bonuses and long-term incentives based on performance.
Pet insurance and other comprehensive benefits.

Full Job Description

THE POSITIONOur roster has an opening with your name on it

FanDuel is looking for a Senior Observability Engineer to design, build, and mature the observability ecosystem that underpins our platform and services. You will deliver deep visibility into system behavior by combining system telemetry with user signals to provide a holistic view of performance, reliability, and user experience. You'll also explore how AI and machine learning can enhance observability, from intelligent alerting and anomaly detection to accelerating root cause analysis.

This is a hands-on role. You'll partner closely with engineering and product teams to deliver scalable observability capabilities, serve as a subject matter expert in monitoring, alerting, and incident management, and equip teams with self-service insights and tooling. By connecting system behavior to real user impact and leveraging AI-assisted workflows to surface issues faster, you'll drive improvements in reliability, performance, and data-informed decision-making across the organization.

In addition to the specific responsibilities outlined above, employees may be required to perform other such duties as assigned by the Company. This ensures operational flexibility and allows the Company to meet evolving business needs.

THE GAME PLAN
Everyone on our team has a part to play

Contribute to the observability strategy and roadmap, partnering with multiple teams to align with business priorities and engineering goals.
Design and enhance scalable observability solutions that provide actionable insights into system health, performance, and user experience.
Help establish and promote best practices for monitoring, alerting, incident management, and postmortems across teams.
Support operational excellence by improving incident response processes, on-call practices, and post-incident reviews, focusing on continuous improvement.
Collaborate on cross-team initiatives to improve system reliability, identifying risks and contributing to their resolution.
Apply automation and AI-assisted workflows to improve root cause analysis and reduce operational toil.
Work with engineering and product stakeholders to surface observability insights that inform technical decisions and prioritization.
Analyze system and user signals to help detect, prevent, and mitigate reliability issues.
Contribute to optimizing observability platforms for performance, scalability, and cost-efficiency.
Mentor peers and contribute to raising observability and reliability standards within the team.
In addition to the responsibilities outlined above, employees may be required to perform other duties as assigned by the Company to ensure operational flexibility and meet evolving business needs.

A Sneak Peek Into Our Tech Stack

AWS, Kubernetes, Terraform, Helm, Ansible, Vault, Datadog and PagerDuty

THE STATS
What we're looking for in our next teammate

Solid hands-on experience in observability engineering, SRE, platform engineering, or related roles, with impact across team-level systems.
Strong expertise in monitoring and observability practices, with hands-on experience using tools such as Datadog.
Experience contributing to observability or reliability initiatives across teams or services.
Proficiency with Kubernetes, cloud infrastructure (e.g. AWS), and infrastructure-as-code tools such as Terraform.
Ability to influence technical decisions within and across teams, collaborating effectively with a range of stakeholders.
Good understanding of distributed systems principles (e.g. consistency, availability, partition tolerance) and practical trade-offs.
Experience defining and implementing SLOs, SLIs, and alerting strategies, including an understanding of user-impacting metrics.
Strong software engineering fundamentals, with proficiency in at least one modern programming language (e.g. Go, Java, Python, or TypeScript), and experience building tooling, automation, and scalable systems.
Experience improving systems through automation, helping reduce operational toil and recurring issues.
Strong analytical and problem-solving skills, with the ability to interpret technical signals and relate them to system performance and reliability.
Good communication and collaboration skills, with the ability to work effectively with both technical and non-technical stakeholders.
A sense of ownership and accountability, with a focus on delivering reliable, scalable solutions and continuous improvement.

Don't check all the boxes? That's okay! We encourage you to still apply if you feel like you possess an adjacent skill set and are interested in learning more about this position.

PLAYER BENEFITS
We treat our team right

We offer amazing benefits above and beyond the basics. We have an array of health plans to choose from (some as low as $0 per paycheck) that include programs for fertility and family planning, mental health support, and fitness benefits. We offer generous paid time off (PTO & sick leave), annual bonus and long-term incentive opportunities (based on performance), 401k with up to a 5% match, commuter benefits, pet insurance, and more - check out all our benefits here: FanDuel Total Rewards. *Benefits differ across location, role, and level.

The applicable salary range for this position is $149,000 - $186,000 USD, which is dependent on a variety of factors including relevant experience, location, business needs and market demand. This role may offer the following benefits: medical, vision, and dental insurance; life insurance; disability insurance; a 401(k) matching program; among other employee benefits. This role may also be eligible for short-term or long-term incentive compensation, including, but not limited to, cash bonuses and stock program participation. This role includes paid personal time off and 14 paid company holidays. FanDuel offers paid sick time in accordance with all applicable state and federal laws.

#LI-Hybrid

* Ladders Estimates

Similar Jobs

Senior Software Engineer, Identity Platform
$166K — $230K *
Upstart
Remote
Today
Senior Member of Technical Staff
$89K — $209K *
Oracle Corporation
Nashville, TN 37211 (Davidson County)
Today
Senior Software Engineer
$120K — $150K *
SkySpecs
Remote
Today
Senior Software Engineer - .NET
$120K — $150K *
hireVouch
Remote
Today
Senior GTS Configuration Lead (Implementation & Maintenance)
$132K — $251K *
Raytheon Technologies
Remote
Today
Senior Software Engineer (Fullstack), Ecommerce
$159K — $254K *
Toast
Remote
Today

Get Ready For Your
Next Interview

More Jobs at FanDuel

Senior Machine Learning Engineer
$138K — $181K *
Atlanta, GA 30349 (Fulton County)
Reposted Today
Information Technology
In-Person
Senior Machine Learning Engineer
$138K — $181K *
Atlanta, GA 30349 (Fulton County)
Reposted Yesterday
Information Technology
In-Person
Senior Machine Learning Engineer
$138K — $181K *
New York City, NY 10025 (New York County)
Reposted Yesterday
Information Technology
In-Person
Acquisition and Investment Strategy Director
$146K — $183K *
New York, NY 10025 (New York County)
4 days ago
Media
In-Person
Finance Manager
$112K — $140K *
New York, NY 10025 (New York County)
4 days ago
Finance & Insurance
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
Director, Sales
$120K — $180K *
Miratech
Remote
Reposted Today
Principal Applied Data Scientist - Search and Browse (NLP, Vector Search, LLMs)
$168K — $356K *
Target Brands, Inc.
Minneapolis, MN 55445 (Hennepin County)
Reposted Today
AI Executive Director
$200K — $250K *
Synopsys Inc
Sunnyvale, CA 94087 (Santa Clara County)
Reposted Today
Staff Machine Learning Engineer, Foundation - Seattle
$208K — $298K *
Qualtrics
Seattle, WA 98115 (King County)
Today

Find similar Senior Observability Engineer jobs:

Nationwide Atlanta, GA

Senior Observability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Observability Engineer jobs:

Get Ready For Your
Next Interview