Job Title: Lead Site Reliability Engineer (SRE) - Cloud Infrastructure & DevOpsOverview / Summary Join a fast-growing engineering organization building highly available, enterprise-scale cloud platforms.
We are seeking a Lead Site Reliability Engineer to drive infrastructure automation, deployment reliability, and operational excellence across a complex technology environment. This role is ideal for someone who enjoys modern DevOps practices, large-scale systems, troubleshooting critical production issues, and improving engineering efficiency through automation.
The ideal candidate combines strong infrastructure expertise with hands-on experience in CI/CD, Infrastructure-as-Code, containerization, and cloud operations.
Key Responsibilities - Design and maintain scalable infrastructure environments
- Build and enhance CI/CD pipelines
- Automate infrastructure provisioning using Terraform
- Support Kubernetes and containerized workloads
- Implement blue/green and canary deployment strategies
- Improve monitoring, logging, and observability practices
- Lead troubleshooting efforts for critical production incidents
- Support capacity planning, patching, and operational readiness
- Mentor engineers and establish operational best practices
Required Qualifications - 7+ years of SRE, DevOps, Infrastructure Engineering, or Platform Operations experience
- Strong CI/CD experience
- Terraform expertise
- Kubernetes experience
- Linux administration experience
- Cloud experience (GCP, AWS, or Azure)
- Python, Bash, or PowerShell scripting experience
- Experience with monitoring and observability tools
- Strong troubleshooting and incident management skills
#LI-Onsite #LI-DT1 #Hiring