Job Title:
DevOps / Site Reliability Engineer (SRE) - API Platform
Overview
We are seeking an experienced DevOps / Site Reliability Engineer (SRE) to help architect, automate, and scale an enterprise API Gateway platform. This role focuses on infrastructure automation, platform reliability, CI/CD enablement, observability, and operational excellence. The ideal candidate will have experience building resilient cloud-based platforms, implementing automation, and supporting large-scale API environments.
Key Responsibilities
• Design and build automated CI/CD pipelines and reusable infrastructure modules to enable secure, scalable, and self-service API deployments.
• Develop and maintain infrastructure automation using Infrastructure as Code principles.
• Design and implement highly reliable, self-healing platform architectures.
• Establish and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), and observability practices.
• Monitor platform health and proactively identify and mitigate operational issues.
• Develop platform tooling and automation solutions to streamline operational processes.
• Create and maintain code-based solutions for infrastructure and platform management.
• Participate in a structured on-call rotation to support platform operations and reliability.
• Collaborate with Security, Product, and Architecture teams to support platform initiatives and technical strategy.
• Contribute to continuous improvement efforts related to scalability, reliability, and operational efficiency.
Required Qualifications
• Bachelor's Degree
• 5+ years of experience in a relevant field
• Experience with Cloud Infrastructure
• Experience with GitHub
• Experience with Google Cloud Platform (GCP)
• Experience with IT Operations
Preferred Qualifications
• Master's Degree
• Experience with Dynatrace
• Experience with Go
• Experience with Tekton
#LI-Hybrid #LI-CP1