The roleAs a Staff Engineer on the Platform team, you'll own and lead platform engineering across our cloud, data, and AI/ML systems. Multi-cloud is central to what we're building, not a future initiative!
You'll work closely with your teammates and engineering leadership to set direction, and build the foundation that lets every team at Compa move fast, strategically. This is a high-ownership individual contributor role where your decisions shape how the enterprise builds and ships.
What you'll do- Design, build, and maintain core infrastructure across cloud, data, and AI/ML systems
- Own and evolve Compa's Kubernetes-based platforms across AWS, GCP, and Azure
- Drive multi-cloud initiatives including cloud abstraction patterns, cross-cloud identity, networking, data movement, traffic routing, and customer-managed encryption keys
- Define platform standards for Kubernetes cluster architecture, workload isolation, tenancy, networking, security, scaling, upgrades, and operational ownership
- Scale and automate infrastructure services and internal tooling
- Raise the bar on reliability and observability through SLIs/SLOs, monitoring, and incident response
- Design and improve CI/CD pipelines, deployment workflows, infrastructure automation and paved paths for engineering teams
- Lead platform efforts that reduce developer toil and improve velocity
- Partner with leadership on technical direction and roadmap
- Act as a multiplier, setting standards that help engineers around you level-up
What we're looking for- 8+ years of software engineering experience with a strong emphasis on infrastructure and platform development
- Deep, hands-on experience with managed Kubernetes (EKS, GKE, or AKS) - cluster architecture, networking, scaling, upgrades and production operations
- Strong Python skills applied to infrastructure tooling and backend systems
- Hands-on experience designing and operating production systems across at least two of AWS, GCP, and Azure; this is a must-have
- Experience managing infrastructure across cloud boundaries: identity, networking, data, traffic routing, and failover
- Observability tooling experience such as Prometheus, Grafana,, OpenTelemetry, Loki or ELK, Jeager, or Tempo
- Comfortable with high ownership and ambiguity - you've built things without a playbook before
Nice to have- DevOps and SRE practices: infrastructure as code, CI/CD, incident response
- Security fundamentals: IAM, secrets management, encryption, least privilege
- Exposure to MLOps
- Early-stage startup experience