Site Reliability Engineer

Tata Consultancy Services • $100K — $130K *

Deerfield, IL 60015In-Person

Information Technology

5 - 7 years of experience

Reposted 3 days ago

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

7+ years in SRE, platform engineering, or cloud infrastructure engineering within large-scale enterprise environments (10,000+ employees).
Minimum 4 years of hands-on experience in Microsoft Azure as a primary cloud engineer.
Expert proficiency with AKS, specifically in cluster lifecycle management and security policies.
Strong skills in infrastructure-as-code tools, specifically Terraform; experience with Azure Landing Zones is preferred.
Proficient in at least one programming/scripting language: Python, Go, or PowerShell.
Experience with enterprise observability platforms, specifically Azure Monitor and Log Analytics.

Responsibilities

Define and manage enterprise-wide SLOs, SLIs, and Error Budgets for Azure-hosted services; communicate SLA compliance to executive stakeholders monthly.
Lead architectural reviews ensuring reliability measures are integrated from design to production.
Implement chaos engineering practices to proactively identify reliability risks.
Design and conduct quarterly Disaster Recovery drills across Azure.
Act as Incident Commander for major incidents, managing the entire incident lifecycle from detection to resolution.
Participate in structured on-call rotation, maintaining response SLAs and driving a blameless post-mortem culture.
Design and operate the enterprise observability stack ensuring full coverage of Metrics, Events, Logs, and Traces.

Benefits

Discretionary Annual Incentive.
Comprehensive medical coverage including health, dental, vision, and disability planning.
Pet insurance plans available.
Family support resources.

Full Job Description

Must Have Technical/Functional Skills

7+ years of experience in SRE, platform engineering, or cloud infrastructure engineering in large-scale
enterprise environments (10,000+ employees or equivalent complexity).
Deep, hands-on expertise with Microsoft Azure minimum 4 years in a primary Azure cloud engineering role.
Expert-level proficiency with AKS: cluster lifecycle management, RBAC, network policies, pod security
standards, cluster autoscaler, and Workload Identity.
Strong infrastructure-as-code skills: Terraform (required) and/or Bicep; experience managing Azure Landing
Zones or Enterprise-Scale architecture.
Proficiency in at least one systems programming/scripting language: Python (preferred), Go, or PowerShell.
Experience designing and operating enterprise observability platforms using Azure Monitor, Log Analytics
and Application Insights at scale.
Demonstrable track record of owning SLOs/SLIs and delivering measurable reliability improvements
in production.
Strong knowledge of enterprise networking in Azure: Hub-and-Spoke/Virtual WAN, ExpressRoute,
Azure Firewall, NSGs, Private Endpoints, and DNS Private Zones.

Required/Preferred Certifications:

AZ-104 | AZ-305 (Preferred) | AZ-400 (Preferred) | CKA | ITIL v4 Foundation

Roles & Responsibilities Reliability & Availability Engineering

Define, own, and enforce enterprise-wide SLOs, SLIs, and Error Budgets across all Tier-0 and Tier-1 Azure-hosted services; report SLA compliance to executive stakeholders monthly.
Lead architectural reviews for new services and ensure reliability non-functionals (availability targets, RTO/RPO) are embedded from design through to production.
Champion and implement chaos engineering practices using Azure Chaos Studio and custom fault injection frameworks to proactively surface reliability risks.
Drive Disaster Recovery (DR) design and conduct quarterly DR drills across Azure paired regions. Incident Management & On-Call
Serve as Incident Commander for P1/P2 major incidents, own end-to-end incident lifecycle from detection through resolution and Post-Incident Review (PIR).
Participate in a structured On-Call rotation with follow-the-sun global coverage; maintain response SLAs of
Drive blameless post-mortem culture and ensure all action items from PIRs are tracked and delivered within agreed SLA.

Observability & Platform Engineering

Design and operate the enterprise observability stack: Azure Monitor, Log Analytics Workspaces, App lication Insights, and Azure Managed Grafana; ensure full MELT (Metrics, Events, Logs, Traces) coverage.
Build and maintain alerting frameworks using Azure Monitor Alert Rules and Azure Action Groups integrated with PagerDuty and ServiceNow.
Develop and operate platform automation, runbooks, and self-healing capabilities using Azure Automation, Logic Apps, and Python/PowerShell scripting.

CI/CD & Infrastructure Reliability

Collaborate with DevOps and development teams to embed reliability gates into Azure DevOps pipelines ; automated performance testing, synthetic monitoring, and progressive deployment (canary/blue-green) strategies.
Manage reliability of AKS clusters across multiple Azure regions, own node pool scaling, upgrade strategy and cluster hardening in alignment with CIS Benchmarks.
Contribute to infrastructure-as-code reliability reviews using Terraform/Bicep to enforce standards across Azure Landing Zones.

Generic Managerial Skills, If any

Produce monthly reliability dashboards and executive-level reporting aligned to enterprise OKRs and IT Risk frameworks.
Collaborate with the Enterprise Architect and Cloud Governance teams to maintain Azure Policy assignments and ensure operational compliance with ISO 27001, SOC 2, and internal control frameworks.
Mentor junior SREs and engineers across the organization; lead SRE community of practice sessions.

About Tata Consultancy Services

Tata Consultancy Services (TCS) is an Indian multinational information technology (IT) services and consulting company, headquartered in Mumbai, Maharashtra, India. It is a subsidiary of Tata Group and operates in 149 locations across 46 countries. TCS is the largest Indian company by market capitalization and is ranked 11th on the Forbes Global 2000 list of the world's biggest public companies. TCS is also the second-largest IT services company in the world by revenue and the largest employer of women in India. The company provides services in areas including IT, consulting, and business solutions.

Learn more about Tata Consultancy Services

Size

469,261 employees

Industry

Information Technology

* Ladders Estimates

Similar Jobs

Application Engineer - Power Platform Developer
$90K — $130K *
ASM Research
Remote
3 days ago
Senior SRE I
$100K — $130K *
Waystar
Louisville, KY 40214 (Jefferson County)
1 week ago
Lead Site Reliability Engineer - Remote
$120K — $150K *
CentralSquare
Remote
2 weeks ago
Devops & SRE
$100K — $130K *
Purple Drive Technologies
Deerfield, IL 60015 (Lake County)
2 weeks ago
Lead Engineer, DevOps & SRE
$120K — $150K *
Launch Potato
Remote
3 weeks ago
Lead SRE/DevOps Engineer
$120K — $150K *
Launch Potato
Remote
3 weeks ago

Get Ready For Your
Next Interview

More Jobs at Tata Consultancy Services

Manufacturing Engineer / Tech Lead
$90K — $120K *
Indianapolis, IN 46227 (Marion County)
Reposted Today
Aerospace & Defense
In-Person
SAP Quality Assurance Developer TOSCA
$110K — $130K *
Seattle, WA 98115 (King County)
Reposted Today
Information Technology
In-Person
Teradata DevOps Engineer
$120K — $140K *
Seattle, WA 98115 (King County)
Reposted Today
Information Technology
In-Person
AI Agentic Engineer
$100K — $120K *
Detroit, MI 48228 (Wayne County)
Reposted Today
Information Technology
In-Person
Azure Data Engineer Databricks
$100K — $140K *
Denver, CO 80219 (Denver County)
Reposted Today
Information Technology
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
3 days ago
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
1 week ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
CHEF DE PROJETS TI - SIÈGE SOCIAL
$93K — $115K *
Caisse Alliance
Sturgeon Falls, ON P2B 1A1
Today
Senior Human Security Engineer
$90K — $120K *
Shamrock Trading Corporation
Overland Park, KS 66212 (Johnson County)
Today

Find similar Site Reliability Engineer jobs:

Nationwide Deerfield, IL

Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Site Reliability Engineer jobs:

Get Ready For Your
Next Interview