Pizza Hut

Site Reliability Engineer III

Pizza Hut$117K — $146K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in computer science, engineering, or related field, or equivalent work experience.
  • 2+ years of experience in Site Reliability Engineering (SRE) with a focus on observability and automation.
  • Familiarity with SRE core principles (e.g. SLO, SLA, SLI, Error Budget).
  • Experience creating monitors and dashboards for observability capabilities.
  • Understanding of incident management practices including RCAs and conducting postmortems.
  • Proficiency in SQL and a general understanding of programming languages like JavaScript, Python, Go, or TypeScript.

Responsibilities

  • Create and automate internal business processes to improve team productivity.
  • Monitor the Taco Bell Digital ecosystem for performance and accuracy of data.
  • Communicate effectively with technical and non-technical stakeholders on system health updates.
  • Perform validation tests on mobile and web applications, providing feedback on improvements.
  • Develop expertise in serverless technologies while learning about modern SRE practices.
  • Participate in on-call rotation and work with Agile methods.
  • Collaborate with teams on business issues impacting revenue and brand reputation.

Benefits

  • Hybrid work schedule with a year-round half-day on Fridays.
  • Onsite childcare and dining options, including a Taco Bell in the building.
  • Access to fitness classes, personal training, and an onsite gym.
  • Up to 4 weeks of vacation plus holidays and volunteer time off.
  • Tuition reimbursement and generous parental leave benefits.
Full Job Description
Job Description

About The Role

At Taco Bell, we're Cultural Rebels. Want to join in on the passion-fueled fun? Learn more about the career below.

We're looking for someone who can own a problem from start to finish - someone who is comfortable digging into the code, identifying the issue, and proposing a fix. This person will become a subject matter expert within the Taco Bell Digital space.

Fundamentally, we're looking for someone who lives and breathes observability and automation, and who is passionate about continuously raising the bar.

We want someone who listens well, communicates clearly and effectively, drives alignment, and helps engineering teams improve their processes.

As a fierce advocate for the customer, your job would take you into troubleshooting issues and incidents, building out new dashboards and alerts, finding out the answers to help us get to the root cause of problems, and ultimately fixing them for good.

The Day-to-Day:
  • Create, update, or automate internal business processes or tools to reduce toil and improve team productivity.
  • Understand and monitor the Taco Bell Digital ecosystem for performance, availability, and accuracy of transactional data.
  • Communicate and collaborate with both technical and non-technical stakeholders on issues, upcoming changes, and updates to system health.
  • Perform final validation tests on various mobile and web-based applications, reporting on, and offering feedback on areas for improvement.
  • Build expertise in serverless infrastructure and initiatives while also learning aspects of modern SRE practices and terms, such as SLIs, SLOs, Observability, toil, and incident response with blameless postmortems.
  • Work with and adopt Agile practices while participating in a 24/7 on-call rotation.
  • Collaborate within the team and with cross-functional partners on high-impact business issues that affect revenue and brand reputation.


Is this you?

Requirements:
  • Bachelor's degree in computer science, engineering, OR a related field, OR equivalent work experience.
  • At least 2+ years of experience in the SRE space, with a focus on observability and automation
  • Familiarity with SRE core principles (e.g. SLO, SLA, SLI, Error Budget, etc.)
  • Hands-on experience creating monitors, dashboards, SLOs, and other observability capabilities
  • Experience with logging solutions or platforms such as DataDog, CloudWatch log insights, etc.
  • Understanding of incident management practices, including leading bridge calls, conducting RCAs, and facilitating postmortems
  • Familiarity with modern observability practices and tools, such as distributed tracing, APM, OpenTelemetry, and RUM
  • Excellent communication and collaboration skills, with the ability to work effectively in a fast-paced environment as a member of a team
  • A fundamentally complete understanding of Observability principles (not just monitoring) + experience using tools like DataDog, Lumigo, CloudWatch, or similar
  • General level understanding of Agile methods such as Kanban, Scrum, etc.
  • Advanced troubleshooting skills
  • A curious mindset and the desire to always keep learning
  • Proactive self-starter capable of operating autonomously
  • Ability to participate in an on-call rotation
  • Proficiency in SQL
  • Fundamental understanding of JavaScript, Python, Go, or TypeScript
  • Working knowledge of AWS services commonly used in serverless and cloud-native environments, including Lambda, API Gateway, Fargate, S3, DynamoDB, and EventBridge

Preferred:
  • Skills surrounding software development (Git, CI/CD, reading and writing code) - preferably in JavaScript/TypeScript and/or Python and with tools like VS Code, Gitlab CI/CD or GitHub Actions
  • Experience with building automations for internal business processes, including AI-embedded workflows
  • Comfort and familiarity with common Unix-like shells (bash, zsh, etc)
  • Experience with Retool
  • Experience with data and observability platforms: FullStory, Amplitude, Embrace
  • Familiarity with AI agentic tools like Claude Code, Codex, Cursor, etc.
  • Prior use of issue tracking systems such as Jira
  • High proficiency with AWS serverless services, including Lambda, API Gateway, Fargate, S3, DynamoDB, and EventBridge.
  • Experience with Infrastructure as Code (IaC) tools such as Terraform, Pulumi, or CloudFormation
  • Knowledge of Akamai tools, processes, and SOCC engagement
  • Proficiency with deploying AI via Amazon Bedrock


Work-Hard, Play-Hard:
  • Hybrid work schedule and year-round flex day Friday (half day)
  • Onsite childcare through Bright Horizons
  • Onsite dining center and game room (yes, there is a Taco Bell inside the building)
  • Onsite dry cleaning, laundry services, carwash,
  • Onsite gym with fitness classes and personal trainer sessions
  • Up to 4 weeks of vacation per year plus holidays and time off for volunteering
  • Tuition reimbursement and education benefits
  • Generous parental leave for all new parents and adoption assistance program
  • 401(k) with a 6% matching contribution from Yum! Brands with immediate vesting
  • Comprehensive medical & dental including prescription drug benefits and 100% preventive care
  • Discounts, free food, swag and... honestly, too many good benefits to name


Salary Range: $117,000 to $146,500 annually + bonus eligibility + equity (if applicable) + benefits

The above represents the expected salary range for this job requisition. Ultimately, in determining your pay, we'll consider your location, experience, and other job-related factors.

At Taco Bell, we Live Más and invite you to do the same. Take a seat at our table. Bring your voice. Bring you, just as you are, a Cultural Rebel. We want you to be your best self!

Similar Jobs

More Jobs at Pizza Hut

More Information Technology Jobs

Find similar Site Reliability Engineer III jobs: