Site Reliability Engineer

Future Secure AI

$120K — $150K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of relevant professional experience in Site Reliability Engineering or similar roles
  • Hands-on experience with Kubernetes, preferably on EKS, AKS, or GKE
  • Proficiency with Terraform for infrastructure automation
  • Familiarity with Helm for streamlined Kubernetes applications deployment
  • Experience in scripting or programming with languages like Python, Go, or Java

Responsibilities

  • Design and implement reliable production infrastructure for AI applications
  • Manage and optimize Kubernetes platforms for AI workloads
  • Automate infrastructure provisioning through code using Terraform
  • Ensure effective deployment workflows with Helm
  • Monitor and enhance system reliability by defining SLIs and SLOs
  • Oversee incident responses and facilitate post-mortem analyses
  • Drive automation to minimize operational workload

Benefits

  • A high-performance culture that promotes excellence
  • Access to cutting-edge technology and tools
  • Opportunity to learn from exceptional leadership
  • Potential for significant impact on projects and initiatives
  • Flexible work arrangements to support work-life balance
  • Encouragement of diversity and creativity in the workplace
Full Job Description
We are looking for a Sr. Site Reliability Engineer to help design, build, and operate the platforms that power AI Co-Workers. This is a hands-on role for an engineer who enjoys owning reliability end-to-end and working closely with product, AI, and engineering teams. The role • Design, build, and operate reliable production infrastructure supporting AI Co-Workers • Own Kubernetes-based platforms used to deploy and run AI workloads • Build and maintain infrastructure as code using Terraform • Implement and maintain Helm-based deployment workflows • Define, measure, and improve system reliability using SLIs, SLOs, and SLAs • Participate in on-call rotation, incident response, root cause analysis, and post-mortems • Reduce operational toil through automation and engineering improvements • Build and improve observability across monitoring, logging, and alerting • Partner closely with engineers to ensure systems are resilient, scalable, and secure • Operate across build, deploy, and operate phases of the software lifecycle Must have criteria • Hands-on Kubernetes experience designing, building, or operating workloads on EKS, AKS, GKE, or self-managed Kubernetes • Hands-on Terraform experience for infrastructure provisioning and automation • Hands-on Helm experience for Kubernetes application deployment • Professional experience using at least two programming or scripting languages such as Python, Go, Java, Bash, PowerShell, or Ruby • Direct Site Reliability Engineer experience or equivalent, including reliability engineering, on-call, incident response, post-mortems, and toil reduction Should have criteria • Experience working within a defined SDLC, including CI/CD, release processes, and end-to-end delivery from design to operations • Hands-on experience with at least one major cloud provider such as AWS, Azure, or Google Cloud • Experience with ArgoCD or GitOps-style deployment approaches • Five or more years of relevant professional experience • DevOps or DevSecOps experience, including CI/CD ownership, infrastructure automation, and security considerations Preferable criteria • Relevant certifications such as CKA, CKAD, cloud certifications, DevOps, DevSecOps, or programming credentials Why Join Us? • A high-performance culture • State-of-the-art technology • Experience world-class leadership • Scale of impact and purpose • A competitive salary and a huge growth trajectory • Work with the best in the industry • Flexible work environment • Diversity and creativity Disclaimer: We do not wish to be contacted by recruitment agencies. Our hiring process is managed in-house and the best way for candidates to express interest is by applying with your resume through our company website.

Similar Jobs

More Jobs at Future Secure AI

More Information Technology Jobs

Find similar Site Reliability Engineer jobs: