Senior Site Reliability Engineer, Data & Analytics

Blizzard Entertainment, Inc.

$101K — $186K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience in site reliability engineering or related fields
  • Hands-on experience with data, analytics, or ML systems
  • Proficient in AI-powered tooling development
  • Strong understanding of DevOps principles and practices
  • Ability to clearly present and communicate complex ideas to a diverse audience
  • Demonstration of deep collaboration skills with partner teams

Responsibilities

  • Empower the reliability of data analytics and ML systems
  • Automate and optimize monitoring and service-level objectives
  • Enhance operational processes with AI-driven tooling
  • Collaborate with various engineering teams to improve reliability
  • Maintain and manage cloud infrastructure using Terraform and Kubernetes
  • Participate in on-call rotations for incident resolution
  • Conduct blameless postmortems to foster a culture of transparency

Benefits

  • Comprehensive medical, dental, and vision coverage
  • 401(k) plan with company matching
  • Generous paid time off including holidays and parental leave
  • Mental health and wellness programs
  • Tuition reimbursement and charitable donation matching
  • Relocation assistance available for geographic moves
Full Job Description
Team Name:
Battle.net & Online Products

Job Title:
Senior Site Reliability Engineer, Data & Analytics

Requisition ID:
R027436

Job Description:

This Senior Site Reliability Engineer role is on our Data & Analytics team, partnering with data, analytics, ML, and platform engineering to improve the reliability, scalability, and performance of large-scale data platforms, analytics pipelines, ML training pipelines, and inference services.

In addition to core SRE responsibilities, this role will build operational and automation tooling that reduces toil, speeds up issue resolution, and improves engineering velocity. This includes contributing to internal platform services such as shared tooling, data integrations, and access-control patterns used across Blizzard.

The ideal candidate is a production-minded SRE or platform engineer who is comfortable operating critical systems, writing software, and building tools that improve engineering efficiency without compromising reliability.

This role is open to candidates based in Irvine, CA or Albany, NY (hybrid or on-site), as well as fully remote candidates.

Responsibilities

  • Participate in an on-call rotation and drive incidents to resolution


  • Lead blameless postmortems and identify systemic reliability improvements


  • Partner with data, ML, and platform teams to improve batch, streaming, training, and inference workloads


  • Support ML training pipelines and inference services, including GPU workloads


  • Help define how data and ML services run on Kubernetes


  • Design and build automation and operational tooling (e.g., workflows, diagnostic tooling, runbooks) to reduce on-call burden


  • Build and evolve centralized platform services, including shared tooling, data integrations, and access controls


  • Diagnose and resolve reliability, performance, and cost issues across distributed systems


  • Champion automation, documentation, and practices that reduce toil


  • Maintain infrastructure using Terraform and infrastructure-as-code principles


  • Improve CI/CD and GitOps workflows (Jenkins, GitHub Actions, ArgoCD)


  • Operate and improve containerized services on Kubernetes


  • Define and measure reliability using SLIs, SLOs, and error budgets


  • Run load tests, capacity modeling, and production validation


  • Build internal tools and paved paths that help teams operate safely and efficiently


Minimum Requirements

  • Experience operating reliable, distributed systems in SRE, platform, or similar roles


  • Experience with data, analytics, ML, or large-scale distributed workloads


  • Strong knowledge of Linux, containers, Kubernetes, and cloud infrastructure


  • Experience building automation or internal tools (Python, Go, shell, etc.)


  • Experience with infrastructure-as-code (e.g., Terraform)


  • Experience with CI/CD or GitOps systems (e.g., Jenkins, GitHub Actions, ArgoCD)


  • Familiarity with observability (metrics, logs, traces, alerting, incident response)


  • Solid understanding of SRE concepts (SLIs, SLOs, error budgets, postmortems)


  • Experience using modern development and automation practices to improve reliability and efficiency


  • Experience building internal tooling, automation, or developer productivity systems


  • Strong communication skills with technical and cross-functional partners


Bonus Points

  • Experience with data and ML systems (training pipelines, model serving, GPU workloads)


  • Experience with distributed systems and messaging (Kafka, Pub/Sub)


  • Experience working in Kubernetes-based environments


  • Familiarity with observability tools (Prometheus, Grafana)


  • Experience operating systems in cloud environments (GCP, AWS)


Rewards

We provide a suite of benefits that promote physical, emotional and financial well-being for 'Every World' - we've got our employees covered! Subject to eligibility requirements, the Company offers comprehensive benefits including:
  • Medical, dental, vision, health savings account or health reimbursement account, healthcare spending accounts, dependent care spending accounts, life and AD&D insurance, disability insurance;
  • 401(k) with Company match, tuition reimbursement, charitable donation matching;
  • Paid holidays and vacation, paid sick time, floating holidays, compassion and bereavement leaves, parental leave;
  • Mental health & wellbeing programs, fitness programs, free and discounted games, and a variety of other voluntary benefit programs like supplemental life & disability, legal service, ID protection, rental insurance, and others;
  • If the Company requires that you move geographic locations for the job, then you may also be eligible for relocation assistance.


Eligibility to participate in these benefits may vary for part time and temporary full-time employees and interns with the Company. You can learn more by visiting https://www.benefitsforeveryworld.com/.

In the U.S., the standard base pay range for this role is $101,000.00 - $186,754.00 Annual. These values reflect the expected base pay range of new hires across all U.S. locations. Ultimately, your specific range and offer will be based on several factors, including relevant experience, performance, and work location. Your Talent Professional can share this role's range details for your local geography during the hiring process. In addition to a competitive base pay, employees in this role may be eligible for incentive compensation. Incentive compensation is not guaranteed. While we strive to provide competitive offers to successful candidates, new hire compensation is negotiable.

Similar Jobs

More Jobs at Blizzard Entertainment, Inc.

More Information Technology Jobs

Find similar Senior Site Reliability Engineer, Data & Analytics jobs: