Software Engineer, Site Reliability

Hebbia

• $160K — $300K *

New York, NY 10025In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of software development experience with a focus on production services
Proficiency in at least one systems or backend language (Go, Python, C++, Rust)
Experience in Production Engineering, SRE, or infrastructure-focused software engineering
Strong understanding of distributed systems and cloud platforms (AWS preferred)
Expertise in container orchestration and debugging complex distributed failures
Knowledge of observability stack building and maintenance
Background in a Production Engineering or software-oriented SRE culture is a plus

Responsibilities

Own critical production services from design through incident response
Profile, benchmark, and rewrite code to eliminate bottlenecks
Lead incident response and drive post-mortem culture for code improvements
Design and build observability frameworks and custom instrumentation
Define and enforce SLOs across platform services
Manage capacity planning and cost efficiency through automation
Build robust internal deployment platforms and CI/CD systems

Benefits

Unlimited PTO
Medical, Dental, and Vision insurance
401K plan
Catered daily lunches and DoorDash dinner credits for late hours
Parental leave: 3 months for non-birthing parents, 4 months for birthing parents
$15k lifetime benefit for fertility support
Competitive new hire equity grant with high upside potential

Full Job Description

The Role

We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production systems end-to-end, designing, building, and improving them rather than simply operating them. You will write production-quality code that keeps the platform reliable at scale, embed with product
engineering teams to influence architecture from the start, and build the internal tooling that every engineer at Hebbia depends on. This is not a ticket-driven ops role. You will spend most of your time writing code: instrumenting services, eliminating performance bottlenecks, building deployment platforms, and translating incident post-mortems into lasting architectural improvements.

Responsibilities

Own critical production services end-to-end, from design and code review through deployment,
operation, and incident response
Profile, benchmark, and rewrite hot paths to eliminate bottlenecks as Hebbia scales
Lead incident response and drive post-mortem culture, translating findings into code changes and
architectural improvements rather than runbooks
Design and build observability frameworks from scratch, writing custom instrumentation, alerting
logic, and debugging tooling that surfaces production issues before customers feel them
Define and enforce SLOs across platform services and build the feedback loops that keep
engineering teams accountable to them
Own capacity planning and cost efficiency: model growth, right-size infrastructure, and write
automation that prevents over-provisioning and resource exhaustion
Build robust, well-tested internal platforms and deployment tooling held to the same engineering
standards as customer-facing code
Own and continuously improve CI/CD systems so engineering teams can ship safely and quickly
Embed with product engineering teams as a peer software engineer, contributing directly to
production codebases and co-designing systems for reliability from the start
Partner on infrastructure security through threat modeling, hardening, and automated compliance
tooling

Who You Are

5+ years software development with a track record of writing, shipping, and maintaining production services, not just operating infrastructure
Production-grade proficiency in at least one systems or backend language: Go, Python, C++, or Rust
Proven experience as a Production Engineer, SRE, or software engineer with a deep infrastructure focus, comfortable owning services end-to-end across the full stack
Deep understanding of distributed systems
Container orchestration expertise and hands-on experience debugging complex distributed failures in production
Working knowledge of OS-level concepts
Cloud platform fluency (AWS preferred)
Experience in building and maintaining observability stacks
Strong CI/CD pipeline expertise and a track record of improving developer velocity without sacrificing safety
Background at a company with a Production Engineering or software-focused SRE culture is a strong plus
Experience building platforms for AI/ML workloads or high-throughput document processing pipelines is a plus

Compensation

The salary range for this role is $160,000 to $300,000. This range may be inclusive of several career levels at Hebbia and will be narrowed during the interview process based on the candidate's experience and qualifications. Adjustments outside of this range may be considered for candidates whose qualifications significantly differ from those outlined in the job description.

Life @ Hebbia

PTO: Unlimited

Insurance: Medical + Dental + Vision + 401K

Eats: Catered lunch daily + doordash dinner credit if you ever need to stay late

Parental leave policy: 3 months non-birthing parent, 4 months for birthing parent

Fertility benefits: $15k lifetime benefit

New hire equity grant: competitive equity package with unmatched upside potential

#LI-Onsite

* Ladders Estimates

Similar Jobs

Software Engineer II - App Core (Remote Eligible)
$125K — $175K *
Smartsheet
Remote
Today
Splunk Software Engineer
$94K — $198K *
CACI International
Fort George G Meade, MD 20755 (Anne Arundel County)
Today
Senior Software Engineer Infra - Compute Platform
$191K *
Coinbase Careers Page
Remote
Today
DevOps Engineer Sr
$120K — $209K *
Lockheed Martin
Herndon, VA 20171 (Fairfax County)
Today
Software Engineer (AI Infrastructure)
$115K — $160K *
Visionist, Inc.
Laurel, MD 20707 (Prince Georges County)
Reposted Yesterday
Cloud Engineer
Redbeard Solutions
Hampton, VA 23666 (Hampton City County)
2 days ago

Get Ready For Your
Next Interview

More Jobs at Hebbia

Backend Engineer, Growth and Data
$160K — $300K *
New York, NY 10025 (New York County)
Today
Enterprise Technology
In-Person
Backend Engineer, Growth and Data
$160K — $300K *
Nye, MT 59061 (Stillwater County)
Today
Enterprise Technology
In-Person
Backend Engineer, Growth and Data
$160K — $300K *
San Francisco, CA 94112 (San Francisco County)
Today
Enterprise Technology
In-Person
Platform Engineer, Document Intelligence
$160K — $300K *
New York, NY 10025 (New York County)
Today
Information Technology
In-Person
AI Strategist, Principal
$225K — $300K *
New York, NY 10025 (New York County)
Today
Finance & Insurance
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
Today
Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Java Developer
$80K — $110K *
Intercontinental Exchange Holdings, Inc.
Jacksonville, FL 32210 (Duval County)
Today
Engineer, Systems Engineering
$90K — $130K *
Intercontinental Exchange Holdings, Inc.
Atlanta, GA 30349 (Fulton County)
Today
Lead Developer
$90K — $130K *
Intercontinental Exchange Holdings, Inc.
Atlanta, GA 30349 (Fulton County)
Today

Find similar Software Engineer, Site Reliability jobs:

Nationwide New York, NY

Software Engineer, Site Reliability

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Software Engineer, Site Reliability jobs:

Get Ready For Your
Next Interview