CloudZero

Senior/Staff CloudOps Engineer

CloudZero$120K — $160K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 3 to 5+ years of experience with distributed systems in AWS
  • Strong skills in Python and Infrastructure as Code (Pulumi or Terraform)
  • Experience with frontier AI models (e.g., Claude, Codex, Gemini)
  • Hands-on experience with monitoring tools (e.g., Prometheus, Datadog)
  • Proven ability to debug production issues under pressure
  • Strong documentation habits for team clarity and system stability
  • Ability to clearly explain technical issues to non-technical stakeholders

Responsibilities

  • Design and maintain Pulumi modules for reliable, cost-efficient cloud resources
  • Own infrastructure end-to-end without console clicking
  • Instrument systems to surface failures quickly for effective debugging
  • Build observability into systems to proactively identify issues
  • Automate deployments, scaling, backups, and changes intelligently
  • Help teams design resilient services and review operational architectures
  • Optimize cloud usage for cost and performance

Benefits

  • Empowerment to solve operational challenges at scale
  • Opportunity to work with cutting-edge technologies
  • Engagement in impactful infrastructure projects
  • Focus on reliability and performance in a high-scale environment
  • Collaborative partnership with product engineering teams
Full Job Description
About the Role

As a CloudOps Engineer you'll be a force multiplier for our engineering organization, owning the performance, reliability, and observability of CloudZero's infrastructure and empowering teams to ship features that help customers understand and optimize their cloud spend.

This is real infrastructure work at real scale, not a ticket-closing role or a console-clicking job. CloudZero processes billions of events daily across AWS, Azure, and GCP. Our customers rely on real-time, accurate cost data to make business-critical decisions, and any instability in our system impacts their planning. Built entirely on a unique serverless architecture with no EC2s or containers, our platform demands infrastructure that scales gracefully, fails predictably, and recovers automatically.

If you thrive on hard operational problems, care deeply about reliability and performance, and want to see your work matter to customers in direct and measurable ways, this role was built for you.

What You'll Do

Infrastructure as Code
  • Design and maintain Pulumi modules that provision reliable, cost-efficient cloud resources
  • Own infrastructure end to end with no clicking through consoles

Observability
  • Instrument systems so that failures surface quickly and debugging happens with data, not guesswork
  • Build observability into everything so you know about problems before customers do

Automation
  • Automate deployments, scaling, backups, and limit changes; if humans are doing it repeatedly, build a system to do it instead
  • Balance automation intelligently, building solutions to real problems rather than automating for its own sake

Partner with Product Engineering
  • Help teams design resilient services, review architectures for operational complexity, and build deployment pipelines that enable safe and fast shipping
  • Optimize for cost and performance; CloudZero's business is helping others optimize cloud costs, and we should be exemplars of efficient cloud usage ourselves
What You Bring
  • 3 to 5+ years of experience building and operating distributed systems in AWS
  • Strong skills in Python and Infrastructure as Code using Pulumi or Terraform
  • Experience with frontier AI models such as Claude, Codex, or Gemini
  • Hands-on experience with monitoring tools such as Prometheus or Datadog
  • Proven ability to debug production issues under pressure
  • Values thoughtful, reliable system design over reactive hero efforts
  • Strong documentation habits to support long-term team clarity and system stability
  • Ability to clearly explain complex technical issues to non-technical stakeholders
  • Excited to take ownership of infrastructure and solve operational challenges at scale

About CloudZero

CloudZero is a cloud cost intelligence platform that helps companies optimize their cloud spending. The company's platform provides real-time visibility into cloud costs and usage, allowing companies to identify areas where they can reduce costs and improve efficiency. CloudZero's software integrates with a variety of cloud providers, including Amazon Web Services, Microsoft Azure, and Google Cloud Platform. The company was founded in 2016 and is headquartered in Cambridge, Massachusetts.
Learn more about CloudZero
Size
50 employees
Industry
Net Income
-$3 million
Founded
2016
5 Year Trend
+80%
Revenue
$2 million

Similar Jobs

More Jobs at CloudZero

More Information Technology Jobs

Find similar Senior/Staff CloudOps Engineer jobs: