Experience writing async web apps using FastAPI in Python
Skilled in building APIs, Cloud infrastructures, and CI/CD pipelines
Proficient with Infrastructure as Code (IaC), AWS, and large-scale database management
Strong understanding of architecture and security best practices
Excellent technical and communication skills
Extensive experience with AWS and Kubernetes
Responsibilities
Design, build, and operate reliable cloud infrastructure for real-time voice AI
Own Kubernetes clusters end-to-end, including debugging production incidents
Build and maintain Infrastructure as Code using tools like Terraform, Pulumi
Create and operate CI/CD pipelines for multiple microservices
Design observability systems to detect failures early
Collaborate with backend engineers to ensure scalable infrastructure
Harden systems with strong security practices and optimize cloud performance
Benefits
Opportunities for professional development and continuous learning
Collaborative work environment that encourages innovation
Flexible work schedule and remote work options
Access to the latest tools and technologies
Participation in a culture of proactive incident management and systemic improvements
Full Job Description
Your day to day:
Design, build, and operate highly reliable cloud infrastructure that powers real-time voice AI systems with extremely low latency and high availability.
Own Kubernetes clusters end-to-end: provisioning, scaling, upgrades, networking, and debugging production incidents under real customer load.
Build, maintain, and evolve infrastructure as code using tools like Terraform, Pulumi, or CloudFormation to ensure repeatable, auditable, and secure environments across staging and production.
Create and operate CI/CD pipelines that enable fast, safe iteration across multiple microservices and teams.
Design and maintain observability systems (metrics, logs, traces, alerting) to detect failures early and rapidly diagnose production issues.
Partner with backend engineers to translate application requirements into scalable, secure infrastructure and clean deployment workflows.
Harden systems through strong security practices including IAM, secrets management, network isolation, and least-privilege access controls.
Optimize cloud performance and costs while maintaining reliability, developer velocity, and customer experience.
Implement and operate GitOps-driven deployment workflows, using Git as the source of truth for infrastructure and application state, enabling safe, auditable, and automated rollouts.
Lead incident response: investigate outages, coordinate fixes, write postmortems, and drive systemic reliability improvements.
Continuously improve resilience through load testing, chaos testing, capacity planning, and proactive infrastructure upgrades.
Qualifications:
5+ years as a DevOps engineer
Experience writing async web apps using fast api in Python
Builder of APIs, Clouds, CI/CD pipelines
Experience with IaC, AWS, Database Management at scale
Understanding of good architecture, security practices