2k Games

Senior Site Reliability Engineer

2k Games$120K — $150K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience in SRE, Platform Engineering, or equivalent infrastructure roles at production scale.
  • Deep experience with Kubernetes in cloud environments (EKS or GKE preferred), including networking and multi-cluster patterns.
  • Strong proficiency with Terraform or Pulumi and hands-on experience with GitOps tooling.
  • Familiarity with both modern and legacy tech stacks including AWS, GCP, and on-premises solutions.
  • Experience with observability tools like Datadog, Prometheus + Grafana, and operationalizing SLI/SLO/error budgets.

Responsibilities

  • Design, build, and maintain scalable multi-cloud and hybrid infrastructure using Infrastructure as Code tools.
  • Own Kubernetes platforms end-to-end, managing lifecycle and networking configurations.
  • Implement progressive delivery patterns for game service deployments to enhance reliability.
  • Drive full observability stack and define alerts that prioritize critical incidents.
  • Lead chaos engineering exercises to identify and mitigate potential failures before they impact players.
  • Enhance CI/CD pipelines' security and efficiency through robust automation strategies.

Benefits

  • Work within a collaborative team focused on building resilient systems for millions of players.
  • Post-mortems focus on systems improvements and encourage a culture of learning.
  • Opportunities for leadership development through promoting SRE practices across various teams.
Full Job Description
The Team

The 2K SRE team owns the infrastructure behind every player connection-All 2K game services, account platforms, CI/CD pipelines, and developer tooling spanning AWS, GCP, and on-premises data centers across multiple global regions. Global launch windows and live-service events push systems to their limits, and this team is expected to hold the line.

Post-mortems here focus on systems, not people. Automation is the default answer to repetitive work. The infrastructure keeps millions of players connected, and the team takes that seriously!
The Role

The Senior SRE at 2K is a hands-on technical leader-shaping production infrastructure across multiple clouds and regions while partnering with network engineers, systems architects, and game studio developers. This is an ownership role: driving technical direction, influencing reliability from architecture review through production operation, and closing the gap between what engineering ships and what players experience.
What You'll Do

Platform & Infrastructure
  • Design, build, and operate scalable multi-cloud and hybrid infrastructure using Terraform, Pulumi, and GitOps workflows (ArgoCD, Flux).
  • Own Kubernetes platforms (EKS, GKE) end-to-end cluster lifecycle, multi-tenancy, networking (Istio, Cilium), and autoscaling.
  • Push progressive delivery patterns (blue/green, canary) across game service deployments.

Observability & Reliability
  • Build and run the full observability stack: Prometheus + Grafana + Datadog.
  • Define SLI/SLO/error budget policies and build alerting that cuts through the noise.
  • Lead chaos engineering exercises to surface failure modes before players encounter them.
  • Drive incident response and post-mortems with a focus on systemic fixes and real follow-through.

Automation, Security & Developer Experience
  • Eliminate toil through self-service provisioning, automated remediation, and intelligent scaling.
  • Harden CI/CD pipelines (GitHub Actions, Jenkins, ArgoCD).
  • Embed security at the platform layer through secrets management (PasswordState, 1Password, and AWS Secrets Manager) and policy-as-code (OPA/Gatekeeper).

Leadership
  • Promote SRE practices across 2K studios through reliability reviews, runbooks, and embedded collaboration.
  • Shape architectural decisions and author engineering RFCs that move the platform forward.
Required Qualifications
  • Experience: 5+ years in SRE, Platform Engineering, or equivalent infrastructure work at production scale.
  • Kubernetes: Deep experience in cloud environments (EKS or GKE preferred), including networking, storage, and multi-cluster patterns.
  • Infrastructure as Code (IaC): Strong proficiency with Terraform and/or Pulumi; hands-on with Helm, Terragrunt, and GitOps tooling (ArgoCD or GitHub Actions).
  • Environments: Experience with modern and legacy tech, including AWS, GCP, VMware, and Bare metal servers.
  • Configuration Management: Server configuration using Ansible, Puppet, and AWS Systems Manager.
  • Observability: Experience with Datadog, Prometheus + Grafana, and OpenTelemetry; fluency in operationalizing SLI/SLO/error budgets inside engineering teams.
  • Software Engineering: Production-quality code in Go, Python, or TypeScript for tools, automation, and internal libraries.
  • Systems & Networking: Solid understanding of Linux internals, TCP/IP networking, DNS, and TLS proven enough to debug at the system level.
  • Incident Management: Incident response and post-mortem leadership with a track record of systemic follow-through.
Preferred Qualifications
  • Live-service game or large-scale consumer internet experience dealing with millions of concurrent users.
  • Deep knowledge of Service mesh (Istio, Cilium) and advanced Kubernetes networking.
  • Experience with FinOps and managing resources efficiently at cloud scale.
  • Experience with AI and Agentic Development.
  • Cloud certifications (AWS Solutions Architect, GCP Professional Cloud Architect, CKA/CKS, or equivalent).
  • Experience mentoring SREs or leading reliability working groups.

#LI-Hybrid

About 2k Games

2K Games is an American video game publisher based in Novato, California. The company was founded in January 2005 by Take-Two Interactive, and is best known for publishing the NBA 2K series of basketball games. Other popular franchises published by 2K Games include the Borderlands series, the BioShock series, and the Mafia series. The company has also published several licensed games based on popular movies and TV shows, such as The Walking Dead and WWE.
Learn more about 2k Games
Size
500 employees
Industry

Similar Jobs

More Jobs at 2k Games

More Information Technology Jobs

Find similar Senior Site Reliability Engineer jobs: