CloudZero

Senior/Staff CloudOps Engineer

CloudZero$130K — $180K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 3 to 5+ years of experience in distributed systems, particularly in AWS
  • Proficient in Python and Infrastructure as Code (Pulumi or Terraform)
  • Nice to have experience with advanced AI models, e.g., Claude, Codex, or Gemini
  • Hands-on with monitoring tools like Prometheus or Datadog
  • Proven troubleshooting skills under pressure
  • Strong documentation practices for system clarity
  • Ability to communicate technical concepts to non-technical audiences
  • Eagerness to own infrastructure operations at scale

Responsibilities

  • Design and maintain Pulumi modules for cloud resource provisioning
  • Own infrastructure management end-to-end without manual console interactions
  • Implement systems for quick failure detection and data-driven debugging
  • Integrate observability into operations to preemptively identify issues
  • Automate deployment, scaling, and backups to reduce human error
  • Intelligently balance automation with practical problem-solving needs
  • Collaborate with Product Engineering on resilient service design and efficient deployment pipelines

Benefits

  • Opportunity to handle real infrastructure challenges at scale
  • Work with a serverless architecture without traditional EC2s or containers
  • Impact customer decision-making through reliable cloud cost data
  • Engage in meaningful operational problems that affect customer experience
  • Be part of a team that exemplifies efficient cloud resource usage
Full Job Description
About the Role

As a CloudOps Engineer you'll be a force multiplier for our engineering organization, owning the performance, reliability, and observability of CloudZero's infrastructure and empowering teams to ship features that help customers understand and optimize their cloud spend.

This is real infrastructure work at real scale, not a ticket-closing role or a console-clicking job. CloudZero processes billions of events daily across AWS, Azure, and GCP. Our customers rely on real-time, accurate cost data to make business-critical decisions, and any instability in our system impacts their planning. Built entirely on a unique serverless architecture with no EC2s or containers, our platform demands infrastructure that scales gracefully, fails predictably, and recovers automatically.

If you thrive on hard operational problems, care deeply about reliability and performance, and want to see your work matter to customers in direct and measurable ways, this role was built for you.

What You'll Do

Infrastructure as Code
  • Design and maintain Pulumi modules that provision reliable, cost-efficient cloud resources
  • Own infrastructure end to end with no clicking through consoles

Observability
  • Instrument systems so that failures surface quickly and debugging happens with data, not guesswork
  • Build observability into everything so you know about problems before customers do

Automation
  • Automate deployments, scaling, backups, and limit changes; if humans are doing it repeatedly, build a system to do it instead
  • Balance automation intelligently, building solutions to real problems rather than automating for its own sake

Partner with Product Engineering
  • Help teams design resilient services, review architectures for operational complexity, and build deployment pipelines that enable safe and fast shipping
  • Optimize for cost and performance; CloudZero's business is helping others optimize cloud costs, and we should be exemplars of efficient cloud usage ourselves
What You Bring
  • 3 to 5+ years of experience building and operating distributed systems in AWS
  • Strong skills in Python and Infrastructure as Code using Pulumi or Terraform
  • Experience with frontier AI models such as Claude, Codex, or Gemini
  • Hands-on experience with monitoring tools such as Prometheus or Datadog
  • Proven ability to debug production issues under pressure
  • Values thoughtful, reliable system design over reactive hero efforts
  • Strong documentation habits to support long-term team clarity and system stability
  • Ability to clearly explain complex technical issues to non-technical stakeholders
  • Excited to take ownership of infrastructure and solve operational challenges at scale

About CloudZero

CloudZero is a cloud cost intelligence platform that helps companies optimize their cloud spending. The company's platform provides real-time visibility into cloud costs and usage, allowing companies to identify areas where they can reduce costs and improve efficiency. CloudZero's software integrates with a variety of cloud providers, including Amazon Web Services, Microsoft Azure, and Google Cloud Platform. The company was founded in 2016 and is headquartered in Cambridge, Massachusetts.
Learn more about CloudZero
Size
50 employees
Industry
Net Income
-$3 million
Founded
2016
5 Year Trend
+80%
Revenue
$2 million

Similar Jobs

More Jobs at CloudZero

More Information Technology Jobs

Find similar Senior/Staff CloudOps Engineer jobs: