Full Job Description
We are looking for a Sr. Site Reliability Engineer to help design, build, and operate the platforms that power AI Co-Workers. This is a hands-on role for an engineer who enjoys owning reliability end-to-end and working closely with product, AI, and engineering teams.
The role
• Design, build, and operate reliable production infrastructure supporting AI Co-Workers
• Own Kubernetes-based platforms used to deploy and run AI workloads
• Build and maintain infrastructure as code using Terraform
• Implement and maintain Helm-based deployment workflows
• Define, measure, and improve system reliability using SLIs, SLOs, and SLAs
• Participate in on-call rotation, incident response, root cause analysis, and post-mortems
• Reduce operational toil through automation and engineering improvements
• Build and improve observability across monitoring, logging, and alerting
• Partner closely with engineers to ensure systems are resilient, scalable, and secure
• Operate across build, deploy, and operate phases of the software lifecycle
Must have criteria
• Hands-on Kubernetes experience designing, building, or operating workloads on EKS, AKS, GKE, or self-managed Kubernetes
• Hands-on Terraform experience for infrastructure provisioning and automation
• Hands-on Helm experience for Kubernetes application deployment
• Professional experience using at least two programming or scripting languages such as Python, Go, Java, Bash, PowerShell, or Ruby
• Direct Site Reliability Engineer experience or equivalent, including reliability engineering, on-call, incident response, post-mortems, and toil reduction
Should have criteria
• Experience working within a defined SDLC, including CI/CD, release processes, and end-to-end delivery from design to operations
• Hands-on experience with at least one major cloud provider such as AWS, Azure, or Google Cloud
• Experience with ArgoCD or GitOps-style deployment approaches
• Five or more years of relevant professional experience
• DevOps or DevSecOps experience, including CI/CD ownership, infrastructure automation, and security considerations
Preferable criteria
• Relevant certifications such as CKA, CKAD, cloud certifications, DevOps, DevSecOps, or programming credentials
Why Join Us?
• A high-performance culture
• State-of-the-art technology
• Experience world-class leadership
• Scale of impact and purpose
• A competitive salary and a huge growth trajectory
• Work with the best in the industry
• Flexible work environment
• Diversity and creativity
Disclaimer: We do not wish to be contacted by recruitment agencies. Our hiring process is managed in-house and the best way for candidates to express interest is by applying with your resume through our company website.