Senior / Staff Site Reliability Engineer, Compute

Fluidstack

• $150K — $200K *

New York, NY 10025In-Person

Information Technology

5 - 7 years of experience

More than 3 months ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years in compute-heavy Site Reliability Engineering (SRE), kernel or virtualization engineering.
Mastery of Linux internals including scheduler, memory, and drivers.
Production experience with KVM, Xen, QEMU, VMware, or similar hypervisors.
Proficient in C, Go, or Rust, with strong Infrastructure as Code (IaC) and CI/CD skills.
Familiarity with SmartNICs/DPUs and kernel-bypass networking.
Demonstrated ability to scale high-throughput compute or HPC platforms.

Responsibilities

Super-charge virtualization by tuning hypervisors, kernel subsystems, and NUMA layouts.
Deploy and optimize new CPU/GPU/DPU nodes and validate SmartNIC off-loads.
Automate observability with performance telemetry and incident response bots.
Lead root-cause analyses of crashes and performance regressions, providing insights to inform configurations.
Collaborate closely with silicon and Linux teams to debug drivers and improve I/O paths.
Continuously improve system performance through chaos engineering and ensuring actionable SLIs/SLOs.

Benefits

Competitive total compensation package including cash and equity.
Retirement or pension plan aligned with local standards.
Comprehensive health, dental, and vision insurance.
Generous PTO policy in accordance with local norms.

Full Job Description

About Fluidstack

Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.

Our team is small, highly motivated, and focused on providing a world class supercomputing experience. We put out customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.

We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.

You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.

About the Role

Our Senior / Staff Site Reliability Engineers (Storage) are the backbone of Fluidstack's platform. You'll utilise deep systems expertise and software engineering to keep our bare-metal and virtualised compute fleet fast, reliable and cost-efficient at petabyte scale.

Focus

Super-charge virtualisation. Tune hypervisors (KVM/QEMU), kernel subsystems and NUMA layouts to squeeze micro-seconds off tail-latency for AI & HPC jobs.
Deploy & optimise at scale. Roll out new CPU/GPU/DPU nodes, validate SmartNIC and BlueField off-loads and harden workload isolation.
Automate observability. Build kernel-to-orchestrator telemetry, incident-response bots and performance dashboards.
Root-cause the gnarly stuff. Lead crash-dumps, kexec/kdump analyses and performance regressions; turn insights into upstream patches and config templates.
Drive kernel & hardware collaboration. Pair with silicon and Linux teams to debug drivers, accelerate I/O paths and integrate emerging compute hardware (TPUs, DPUs).
Continuously improve. Inject chaos, run game-days and codify post-mortem learnings into SLIs/SLOs that matter to customers.

About you

5+ yrs in compute-heavy SRE, kernel or virtualisation engineering.
Mastery of Linux internals (scheduler, memory, drivers) and system-level debugging.
Production experience with KVM, Xen, QEMU, VMware or similar hypervisors.
Fluency in C, Go or Rust; solid Infra-as-Code & CI/CD chops.
Familiarity with SmartNICs/DPUs and kernel-bypass networking.
Proven track record scaling high-throughput compute or HPC platforms.

Benefits

Competitive total compensation package (cash + equity).
Retirement or pension plan, in line with local norms.
Health, dental, and vision insurance.
Generous PTO policy, in line with local norms.

* Ladders Estimates

Similar Jobs

Performance & Commerce Media Manager
$142K — $195K *
Warren, NJ 07059 (Somerset County)
Reposted Today
Director of Strategic Communications and Outreach
$120K — $225K *
Yale University
Remote
Today
Pediatric Nutrition National Account Manager - West Region
$129K — $258K *
Abbott
Remote
Reposted Today
Institutional Sales Managing Director - East Coast & Canada
$180K — $250K *
Patria Investments
New York, NY 10025 (New York County)
Reposted Today
Founding Software Engineer
$120K — $160K *
Leap Health, Inc
New York, NY 10025 (New York County)
Today
Founding Software Engineer
$120K — $150K *
Leap Health, Inc
Remote
Today

Get Ready For Your
Next Interview

More Jobs at Fluidstack

Senior Structural Engineer, Modular Systems & Skids
$200K — $250K *
San Francisco, CA 94112 (San Francisco County)
Yesterday
Manufacturing & Automotive
In-Person
Senior Structural Engineer, Modular Systems & Skids
$200K — $250K *
New York, NY 10025 (New York County)
Yesterday
Real Estate & Construction
In-Person
Site Manager, Data Center Operations
$150K — $250K *
Buffalo, NY 14221 (Erie County)
Reposted Yesterday
Information Technology
In-Person
Director of Security
$300K — $400K *
Austin, TX 78745 (Travis County)
2 days ago
Technical Services
In-Person
Director of Security
$300K — $400K *
San Francisco, CA 94112 (San Francisco County)
2 days ago
Technical Services
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
5 days ago
Sr. Software Engineer - Integration
$100K — $130K *
Blue Cross Blue Shield Of Tennessee
Remote
Today
IT Internal Audit Manager
$101K *
Ensemble Health Partners
Remote
Today
ServiceNow Developer
$80K — $100K *
Bird Construction
Mississauga, ON L4T 0A1
Today
Senior Cloud DevOps Engineer
$132K — $176K *
American Automobile Association
Costa Mesa, CA 92627 (Orange County)
Today

Find similar Senior / Staff Site Reliability Engineer, Compute jobs:

Nationwide New York, NY

Senior / Staff Site Reliability Engineer, Compute

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior / Staff Site Reliability Engineer, Compute jobs:

Get Ready For Your
Next Interview