Production Engineer, IaaS

Fluidstack

$175K — $300K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in production engineering or related field
  • Strong understanding of API design and versioning
  • Familiarity with AI tooling, including LLM APIs
  • Proven experience shipping production services at scale
  • Fluency in programming languages, especially Go and Python
  • Experience with distributed systems and data pipelines is a plus
  • Ability to learn quickly in unfamiliar domains

Responsibilities

  • Own and operate the observability platform for real-time fleet insights
  • Define and build the API surface for interaction with production infrastructure
  • Build a unified production control plane for machine management
  • Maintain fleet state as the source of truth across systems
  • Ensure new hardware integrates cleanly into production environments

Benefits

  • Competitive total compensation package inclusive of salary and equity
  • Retirement or pension plan based on local standards
  • Health, dental, and vision insurance coverage
  • Generous paid time off policy adhering to local norms
Full Job Description
The Production Engineering Team

Examples of key exciting problems the team is working on
  • We're building the observability platform that makes a fleet of tens of thousands of XPUs legible in real time - from site-level health down to individual device and link, with a data decoration and correlation engine that turns raw telemetry into signal.
  • We're designing the API surface that every team at Fluidstack uses to interact with production infrastructure - replacing one-off tooling with a stable, versioned control plane that includes unified machine management, actual state inspection, and distributed command execution.
  • We're integrating fleet state as a machine-readable source of truth across provisioning, operations, and customer-facing platforms - so that what the system says about itself always matches reality, and every new site and XPU generation lands cleanly from day zero.
Role Scope
  • Own the observability platform. Build and operate the data pipelines, decoration and correlation engine, and healthcheck framework that make the fleet legible - from site down to device and link. No other team should need to scrape production directly to answer a question.
  • Define and build the API surface for infrastructure. Design the contracts between production infrastructure and every tool that touches it. All other teams at Fluidstack use your tooling to manage and operate our hyperscale fleet.
  • Build the production control plane. Unified machine management, actual state inspection, distributed command execution - and the Kubernetes-based infrastructure that underpins it all.
  • Own fleet state as source of truth. SLOs, site lifecycle state, and integration with internal infrastructure management and customer-facing operations platforms. What the system says about itself should match reality, and you're accountable when it doesn't.
  • Land new hardware into the platform cleanly. ZTP, DHCP, DNS, artifacts - every new XPU generation and site integration goes through IaaS before production.
What We're Looking For

The below is a starting point. We always make space for exceptional people, so if you don't fit this role exactly, tell us where you would.
  • You treat toil as a bug. If something requires a human to do it twice, you build the thing that makes it not require a human.
  • You design APIs that age well. You've felt the pain of a leaky abstraction at scale and you don't repeat it.
  • You move toward ambiguity, not away from it. You walk into the fog, build the map, and explain it to everyone else.
  • You learn at a steep slope. You reach real competence in an unfamiliar domain fast. We value this over existing expertise.
  • You carry a pager without flinching. You run the incident, write the postmortem, fix the systemic cause, and move on.
  • You're fluent with AI tooling. LLM APIs, MCP servers, and agentic frameworks, and you drive Claude Code, Cursor, or similar every day.
  • You've shipped production services that other teams depend on at scale, and you're comfortable in any language using AI coding tools.
  • Bonus: Distributed systems and data pipeline engineering. Time-series observability stacks (Prometheus, Thanos, VictoriaMetrics). API design and versioning at scale. Workflow and orchestration engines (Temporal, Cadence). BMC/Redfish or hardware telemetry. Go, Python, and Postgres.


Salary & Benefits
  • Competitive total compensation package (salary + equity).
  • Retirement or pension plan, in line with local norms.
  • Health, dental, and vision insurance.
  • Generous PTO policy, in line with local norms.

The base salary range for this position is $175,000 - $300,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.

We are committed to pay equity and transparency.

Similar Jobs

More Jobs at Fluidstack

  • Software Engineer
    $150K — $250K *
    New York, NY 10025 (New York County)
    Information Technology
    In-Person
  • Software Engineer
    $150K — $250K *
    Seattle, WA 98115 (King County)
    Information Technology
    In-Person
  • Software Engineer
    $150K — $250K *
    Austin, TX 78745 (Travis County)
    Information Technology
    In-Person
  • Software Engineer
    $150K — $250K *
    San Francisco, CA 94112 (San Francisco County)
    Information Technology
    In-Person
  • Production Engineer, Compute
    $175K — $300K *
    Austin, TX 78745 (Travis County)
    Information Technology
    In-Person

More Information Technology Jobs

Find similar Production Engineer, IaaS jobs: