About the RoleThe Platform & Infrastructure team is seeking a Cloud Infrastructure Engineer for our Cellular Infrastructure team. This team owns the full lifecycle of Abnormal's cell-based deployment architecture-bootstrapping new cells, deploying our entire application and infrastructure stack onto them, and keeping every cell healthy, isolated, cost-efficient, and compliant. Engineers on this team wear multiple hats: infra engineering, application-layer debugging, and close collaboration with product and application teams to minimize overhead so those teams can stay focused on building.
What You Will Do- Bootstrap new cells end-to-end: full infrastructure setup (compute, networking, IAM, etc.) and complete application stack deployment.
- Maintain and evolve cell lifecycle tooling to make provisioning repeatable, auditable, and operator-friendly-reducing manual steps and time-to-production.
- Partner with application and product teams to design and implement scalable, cell-native architecture approaches.
- Design, build, test, scale, monitor, and maintain secure, cost-efficient infrastructure in a multi-cloud environment (AWS and Azure).
- Triage and resolve complex cross-layer issues quickly, then drive root cause fixes that prevent recurrence.
- Drive down technical debt and toil through automation and systemic improvements to the cell deployment lifecycle.
- Participate in on-call rotation with a learning-oriented mindset, identifying systemic gaps and driving long-term reliability improvements.
- Keep cross-team communication low-friction and high-signal: proactive and well-contextualized.
- Contribute as a core member of an agile team through sprint planning, standups, and execution with a strong sense of ownership and teamwork.
Must Haves- Bachelor's degree in Computer Science or a related technical field.
- 4+ years of experience engineering cloud infrastructure for production microservice systems, with attention to performance, reliability, security, and cost.
- 2+ years of Python experience, including application-layer code (not just scripts).
- 1+ year of experience with Kubernetes and Helm.
- 1+ year of AWS experience ( VPC, IAM, S3, Route 53, CloudFront, EKS, ECS, CloudWatch)
- 1+ year of Terraform and HCL experience.
- Comfort operating across infra and application engineering without hard boundaries.
- Experience with on-call rotations, incident response, and operating production-grade systems.
- Practical experience using Generative AI tools in day-to-day engineering workflows.
- Strong communication skills and the ability to thrive in a fast-paced, remote-first environment-balancing autonomy with collaboration, demonstrating a bias toward action, and maintaining a positive, constructive mindset.
Nice to Haves- Experience with Bash, Golang, Terragrunt and data infrastructure (Spark, Databricks)
- Hands-on experience with cell-based, multi-tenant, or multi-region infrastructure architectures.
- Familiarity with Generative AI developer tools such as Claude Code, and experience driving AI-first engineering workflows.
- Prior experience building large-scale IaC abstractions or internal developer platforms.
- AWS certifications.
#LI-ML1
Actual compensation will be determined based on several non-discriminatory factors including skills, experience, qualifications, and geographic location.
In addition to base salary, this role may be eligible for bonus or incentive compensation, equity, and a comprehensive benefits package.
Base salary range:
$149,200-$214,500 USD