Job Summary:
We are seeking an experienced Azure Site Reliability Engineer (SRE) to design, develop, and support scalable cloud-based solutions. The ideal candidate will work extensively with public cloud infrastructure, primarily Azure, and contribute to automation frameworks, CI/CD pipelines, and observability solutions. This role requires strong expertise in Python development, Infrastructure as Code, and cloud-native practices, along with the ability to independently own tasks and deliver high-quality, reliable systems in a fast-paced environment.
Key Responsibilities:
• Design and develop Python-based applications and automation solutions
• Enhance and integrate CSP automation frameworks with in-house tools (Azure focus)
• Build and integrate automation workflows into CI/CD pipelines
• Participate in all phases of SDLC (analysis, design, development, testing, deployment)
• Develop proof of concepts for new technologies and solutions
• Troubleshoot and provide production support for cloud and on-prem applications
• Evaluate and implement DevOps and cloud-native tools
• Build observability into cloud platforms using monitoring and logging tools
• Identify and reduce operational toil through automation and process improvements
• Collaborate with global teams to gather requirements and deliver solutions
Required Skills:
• Strong experience in Python development
• Experience with Infrastructure as Code tools (Terraform, Ansible)
• Hands-on experience with CI/CD tools (GitHub Actions, Jenkins)
• Solid understanding of object-oriented design and development principles
• Proficiency in Linux/Unix environments
• Experience with NoSQL databases (data modeling, tuning, testing)
• Ability to write clean, reusable, and optimized code following best practices
• Experience implementing observability tools (Prometheus, Grafana, OpenTelemetry)
• Strong problem-solving skills and ability to work independently
Preferred Skills:
• Experience working with Azure cloud services
• Exposure to multi-cloud or other CSP environments
• Knowledge of DevOps practices and cloud architecture
• Experience in performance optimization and system reliability engineering
Certifications:
• Azure or Cloud certifications (preferred but not mandatory)
Education:
• Bachelor's degree in computer science, Engineering, or related field (or equivalent experience)