Cloud/DevOps Engineer - III

Compunnel

$120K — $150K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science, Information Systems, or related field
  • 7+ years in software development with a focus on reliability
  • 5+ years of advanced Python development experience
  • 3+ years of AWS development with a deep understanding of key services
  • 3+ years applying SRE principles and practices
  • Expert-level proficiency in Infrastructure as Code (IaC) using Terraform
  • Strong experience with CI/CD pipelines and DevOps practices

Responsibilities

  • Design and maintain reliability solutions and SRE utilities
  • Build and optimize Infrastructure as Code (IaC) using Terraform for AWS resources
  • Develop CI/CD pipelines and automated testing for code quality
  • Define SRE standards and best practices across teams
  • Establish SRE metrics such as SLIs and SLOs
  • Apply software engineering best practices, including version control and code reviews
  • Participate in incident management and provide technical support

Benefits

  • Not specified
Full Job Description
JOB SUMMARY
As a Senior Cloud Engineer in the Cloud SRE team, you will be responsible for designing and developing cloud solutions and engineering reliability tools for the Cloud Foundation Services (CFS) platform. You will apply software engineering practices to build scalable, reusable solutions and utilities that enhance platform reliability.

Key Responsibilities
- Design, develop, and maintain reliability solutions and SRE utilities to reduce toil, improve cloud platform reliability, and industrialize SRE practices.
- Build and optimize Infrastructure as Code (IaC) using Terraform to manage AWS resources, incorporating cost-efficient design principles.
- Develop CI/CD pipelines and automated testing to ensure code quality, reliability, and rapid delivery of solutions.
- Define SRE standards, best practices, and guidelines for adoption across teams.
- Establish SRE metrics like SLI, SLOs, etc.
- Apply software engineering best practices including version control, code reviews, test-driven development, and documentation.
- Participate in incident management and on-call rotation, providing technical support, troubleshooting production issues, and collaborating with teams to reduce incident recurrence.
- Stay current with emerging AWS services, SRE methodologies, and cloud-native development technologies, and drive adoption of innovative solutions.
- Collaborate within Agile and Scaled Agile frameworks with cross-functional teams to deliver integrated cloud automation solutions.
- Produce clear, blameless postmortems with actionable items and documented failure scenarios.

Required Qualifications
- Bachelor's degree in computer science, Information Systems, or equivalent background or equivalent experience.
- 7+ years of extensive experience in software development with focus on reliability and platform engineering.
- 5+ Years of advanced Python development skills with proven experience building enterprise-grade, highly available tools, APIs, and utilities.
- 3+ years of hands-on experience developing solutions in AWS environments with deep understanding of core services (EC2, VPC, S3, Lambda, IAM, CloudFormation, EventBridge, Step Functions etc.) and resource cost optimization.
- 3+ years of experience applying SRE principles including observability, toil automation, SLIs/SLOs and reliability engineering.
- Expert-level proficiency with Infrastructure as Code (IaC) using Terraform, including module development and state management.
- Strong experience with CI/CD pipelines, automated testing frameworks, and DevOps practices.
- Experience with observability tools and practices including Grafana, AWS CloudWatch, AWS Canary.
- Experience defining, implementing, and managing SLOs/SLIs and error budgets; familiarity with conducting RCAs and producing postmortem documentation.
- Working experience in Agile and Scaled Agile environments and familiarity with ITSM processes (incident, change, and problem management), resilience testing and chaos engineering practices.

Preferred Qualifications
- Experience with GoLang or additional programming languages.

Certifications
- None specified.

Similar Jobs

More Jobs at Compunnel

More Information Technology Jobs

Find similar Cloud/DevOps Engineer - III jobs: