What You'll Own- End-to-end ownership of highly available, scalable AWS infrastructure
- Design, operation, and continuous improvement of Kubernetes (EKS) platforms
- Reliability of production systems through strong observability, automation, and SLOs
- CI/CD systems that enable safe, fast, and repeatable deployments
- Infrastructure defined and enforced through Terraform and GitOps
- Incident response, root cause analysis, and long-term remediation
- Raising operational standards through automation, documentation, and best practices
Technical Requirements:We're looking for engineers who have actually built, run, and scaled real production systems in the following areas:
Cloud & Infrastructure- Deep AWS expertise - networking, compute, IAM, scaling, security
- Strong experience managing infrastructure using Terraform at scale
Kubernetes & Platform Engineering- Very strong Kubernetes fundamentals (internals, scheduling, networking, storage)
- Hands-on experience operating Amazon EKS in production environments
- Experience troubleshooting complex, multi-layer Kubernetes issues
Coding & Automation- Ability to write clean, maintainable, production-quality code in: Go/ Python
- Strong automation mindset - eliminating toil through code
CI/CD & GitOps- Proven experience building and operating CI/CD pipelines
- Hands-on experience with:
- GitHub (Actions or integrations)
- ArgoCD and GitOps-based deployment workflows
Observability & Reliability- Strong understanding of observability principles: metrics, logs, traces, and alerting
- Hands-on experience with Datadog or similar tool for:
- Infrastructure and Kubernetes monitoring
- Application performance monitoring (APM)
- Alerting, dashboards, and incident detection
- Experience defining and using SLIs/SLOs to drive reliability decisions
Ability to turn observability data into actionable operational improvements
Follow Us!YouTube | LinkedIn | X (Twitter) | Facebook