CloudZero

Lead ML Engineer

CloudZero$130K — $180K *
Enterprise Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 6+ years in ML engineering and/or data science with production experience at scale.
  • Expertise in time-series analysis and building forecasting/anomaly detection systems.
  • Strong understanding of classical ML methods including graphs and probabilistic modeling.
  • Full-cycle production ML engineering experience, from feature engineering to monitoring.
  • Proficient in Python and experienced with data warehouses like Snowflake or BigQuery.
  • Academic background in Computer Science, Statistics, Mathematics, or a related field.
  • Familiarity with GenAI/LLM and Cloud ML infrastructure, preferably AWS.

Responsibilities

  • Spend 60-70% of your time on hands-on building and problem-solving.
  • Prototype and translate ML/AI research ideas into scalable production systems.
  • Develop AI features for cost optimization and predictive analytics in collaboration with product teams.
  • Establish technical standards and development processes for AI/ML systems.
  • Build and mentor a small team of AI/ML specialists.
  • Cultivate a culture of innovation and customer focus within the team.
  • Lead technical decision-making and problem-solving effectively.

Benefits

  • Flexible working environment with a focus on innovation.
  • Opportunity to work on cutting-edge ML problems in a scale-up setting.
  • Ability to shape the technical direction of the company as a founding engineer.
  • Collaborative team culture that encourages learning and personal development.
  • Access to industry events to showcase and discuss AI capabilities.
Full Job Description
About the Role:

The ML problems that define the future of cloud cost-per-anything CloudZero is the cost-per-anything model for cloud and Al - for humans and the agents spend

they deploy. We're inverting cost intelligence: from billing-first to telemetry-first. Every

CloudZero is inverting the traditional cost intelligence model. Engineering decision is a buying decision - Instead of starting from the monthly bill, we're building toward

and we're building the platform that proves it in a telemetry-first platform - lightweight collection agents real time.inside customer environments, capturing every Al inference

event, cloud resource usage, and product telemetry signal in Telemetry-FirstCost-to-Produce Al Inference Agentic Governance ML-Powered real time. That data is reconciled against billing to produce total cost-to-produce intelligence. Not just COGS. The full picture.

Al is making every company look like a multi-tenant SaaS. Every enterprise now has per-model, per-token, per-customer Al inference complexity - and no one has a

real-time answer for how to measure, govern, and optimize it. CloudZero is building that answer: a multi-tier architecture spanning real-time streaming (Kafka, Flink/KStreams), batch billing reconciliation, and an intelligent governance layer for both human engineers and the autonomous agents they deploy. Most of what makes this role extraordinary is what

we're building next. This is a founding technical engineer role. You won't be managing a team on day one - you'll be anchoring one. You'll set the technical patterns, solve the hardest data science problems in the product, and help build the team around you. The vision: CloudZero becomes the cost-per-anything model for cloud and Al - for humans and

the agents they deploy.

6 hard ML problems. They sit at the intersection of financial telemetry, cloud infrastructure, Al inference, and massive scale. Some are live in product today; several are what we're building next.

  • Real-time Unit Economics: Calculate per-unit costs across millions of transactions with dynamic efficiency management
  • Predictive Cost Intelligence: Predict and prevent cost efficiency breaches before they impact business
  • Multi-Cloud Attribution: Accurately attribute cloud spend across complex systems using probabilistic modeling
  • Autonomous Optimization: Build AI agents that make safe infrastructure changes within business constraints
Responsibilities:
  • Lead by example: spend 60-70% of your time building, architecting, and solving technical problems
  • Prototype novel ML/AI research ideas, and help translate them into production-ready systems that handle enterprise scale
  • Build AI-powered features (in partnership with product/engineering teams) for cost optimization, anomaly detection, and predictive analytics
  • Establish technical standards and development processes for AI/ML systems
  • Build and develop a small team of AI/ML specialists
  • Provide hands-on coaching and technical guidance to team members
  • Foster a culture of innovation, continuous learning, and customer focus
  • Lead by example in technical decision-making and problem-solving approach
  • Partner closely with engineering teams to embed AI throughout the platform
  • Translate complex AI concepts into business value for executives and customers
  • Drive AI strategy alignment with company vision and product roadmap
  • Represent CloudZero's AI capabilities in customer conversations and industry events
Qualifications:
  • 6+ years in ML engineering and/or data science disciplines, with meaningful time in production systems at scale
  • Deep time-series fluency - you've built forecasting and anomaly detection systems that made it to production and earned customer trust
  • Classical ML foundations - graphs, clustering, probabilistic modeling, data structures. You reach for the right tool, not the trendiest one
  • Production ML engineering - you've owned the full stack: feature engineering, model serving, monitoring, retraining pipelines, feedback loops
  • Python fluency and data warehouse experience (Snowflake, BigQuery, or equivalent)
  • Formal background - in Computer Science, Statistics, Mathematics, or a related quantitative field
  • GenAI/LLM experience - you've integrated LLMs, seen their failure modes, and know when to use them vs.traditional ML
  • Cloud ML infrastructure - AWS SageMaker, Bedrock, or equivalent. Building systems at enterprise scale in AWS/GCP
  • FinOps or cost intelligence domain nice to have - understanding of cloud billing, infrastructure cost models, or related financial data
  • Founding IC experience - you've been the first or second data scientist and know what it takes to build from scratch
  • Graph modeling and semantic layers - knowledge graphs, entity resolution, or semantic modeling in production contexts
  • Bias toward correctness - you care whether models are actually right, not just accurate on a validation set


About CloudZero

CloudZero is a cloud cost intelligence platform that helps companies optimize their cloud spending. The company's platform provides real-time visibility into cloud costs and usage, allowing companies to identify areas where they can reduce costs and improve efficiency. CloudZero's software integrates with a variety of cloud providers, including Amazon Web Services, Microsoft Azure, and Google Cloud Platform. The company was founded in 2016 and is headquartered in Cambridge, Massachusetts.
Learn more about CloudZero
Size
50 employees
Industry
Net Income
-$3 million
Founded
2016
5 Year Trend
+80%
Revenue
$2 million

Similar Jobs

More Jobs at CloudZero

  • CloudZero
    GRC Manager
    $100K — $130K *
    Boston, MA 02115 (Suffolk County)
    Enterprise Technology
    In-Person
  • CloudZero
    GRC Manager
    $120K — $150K *
    San Francisco, CA 94112 (San Francisco County)
    Enterprise Technology
    In-Person
  • CloudZero
    Senior CloudOps Engineer
    $120K — $160K *
    San Francisco, CA 94112 (San Francisco County)
    Information Technology
    In-Person
  • CloudZero
    Senior CloudOps Engineer
    $120K — $150K *
    Boston, MA 02115 (Suffolk County)
    Information Technology
    In-Person

More Enterprise Technology Jobs

Find similar Lead ML Engineer jobs: