Job Title: Lead Site Reliability Engineer (GCP & Kubernetes)
Overview / Summary
We are seeking a Lead Site Reliability Engineer to drive reliability, scalability, and operational excellence across a rapidly growing technology ecosystem. This role serves as a technical leader focused on cloud architecture, Kubernetes platforms, infrastructure automation, and highly available distributed systems. The position plays a key role in defining infrastructure strategy, improving platform resiliency, and mentoring engineering teams.
Key Responsibilities
• Design and support highly available cloud infrastructure in GCP
• Architect and manage Kubernetes environments at scale
• Build and maintain Infrastructure-as-Code using Terraform
• Develop and manage Helm charts and Kubernetes deployments
• Design failover, disaster recovery, and multi-region strategies
• Improve platform scalability, reliability, and performance
• Implement monitoring, alerting, and observability best practices
• Partner with engineering teams on platform architecture and cloud adoption
• Mentor engineers and provide technical leadership
Required Qualifications
• 7+ years of experience in Site Reliability Engineering, Platform Engineering, Cloud Engineering, or DevOps
• Expert-level Kubernetes experience
• Strong Google Cloud Platform (GCP) experience
• Expertise with Terraform
• Experience with Helm
• Multi-cloud exposure, including AWS and Azure
• Experience with distributed systems
• Python or Bash scripting experience
• Experience with Prometheus, Grafana, Splunk, or OpenTelemetry
#LI-Onsite #LI-DT1 #Hiring