Senior Site Reliability Engineer

Blitzy

• $160K — $180K *

Cambridge, MA 02139In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
Strong proficiency in AWS, with Kubernetes and container orchestration experience at scale.
Hands-on experience with infrastructure-as-code tools like Terraform or Pulumi.
Proven track record in designing and maintaining high-availability, distributed systems.
Deep expertise in observability tools and incident management practices.
Strong scripting skills in Python, Go, Bash, or similar languages.
Excellent communication skills for collaboration across engineering teams.

Responsibilities

Design, build, and operate scalable infrastructure across cloud environments (AWS, GCP, or Azure).
Define and enforce SLOs, SLAs, and error budgets, leading blameless postmortems.
Build and maintain CI/CD pipelines, automation, and deployment infrastructure.
Own observability by maintaining logging, metrics, tracing, and alerting stacks.
Collaborate with software engineering to embed reliability practices into development.
Drive capacity planning, performance benchmarking, and cost optimization.
Champion security best practices within infrastructure and deployment layers.

Benefits

Opportunity to work in a fast-paced, high-impact environment.
Direct influence over architectural decisions on a new platform.
Collaboration with world-class engineers and a dynamic team culture.
Potential for professional growth as a founding member of the Pune SRE team.

Full Job Description

Location: Cambridge, MA (In-Office)

Compensation: $160,000 - $180,000 + equity eligibility based on performance

The Role

As a Senior Site Reliability Engineer at Blitzy's Cambridge headquarters, you will be the backbone of our platform's reliability, scalability, and operational excellence. You'll work at the intersection of software engineering and infrastructure, ensuring our AI-powered development platform remains highly available and performant as we scale rapidly. This is a high-impact, hands-on role for an engineer who thrives in a fast-moving environment and takes deep ownership of the systems they build.

What Success Looks Like

In 30 days: You have a deep understanding of Blitzy's infrastructure architecture, have identified key reliability risks, and are actively contributing to on-call rotations.
In 90 days: You have shipped meaningful improvements to observability, incident response workflows, and deployment pipelines that measurably reduce MTTR and increase system uptime.
In 6 months: You have driven at least one major reliability initiative from inception to production, established SLO/SLA frameworks for critical services, and are a trusted technical voice shaping our infrastructure roadmap.

Areas of Ownership

Design, build, and operate scalable, fault-tolerant infrastructure across cloud environments (AWS, GCP, or Azure).
Define and enforce SLOs, SLAs, and error budgets; lead blameless postmortems and drive systemic improvements.
Build and maintain robust CI/CD pipelines, release automation, and deployment infrastructure.
Own observability: design and maintain logging, metrics, tracing, and alerting stacks (e.g., Prometheus, Grafana, Datadog, OpenTelemetry).
Partner closely with software engineering teams to embed reliability practices into the development lifecycle.
Drive capacity planning, performance benchmarking, and cost optimization across our infrastructure.
Champion security best practices within the infrastructure and deployment layers.

Required Experience

5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
Strong proficiency in at least one major cloud platform (AWS preferred); experience with Kubernetes and container orchestration at scale.
Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, or equivalent).
Proven track record designing and maintaining high-availability, distributed systems.
Deep expertise in observability tooling, incident management, and on-call practices.
Strong scripting and automation skills (Python, Go, Bash, or similar).
Excellent communication skills with the ability to collaborate across engineering teams and present technical findings to leadership.

What Makes You Stand Out

Experience supporting AI/ML workloads or GPU-accelerated infrastructure.
Prior experience in a high-growth startup environment where you wore multiple hats.
Familiarity with eBPF, service mesh technologies (Istio, Linkerd), or advanced networking.
Contributions to open-source SRE/DevOps tooling or communities.
Experience building global, multi-region infrastructure with strict latency and availability requirements.

What Makes This Role Different

You won't be maintaining legacy systems or fighting fires in a sprawling monolith. At Blitzy, you're building reliability into a greenfield AI platform that is redefining how the world creates software. You'll have direct influence over architectural decisions, work side-by-side with world-class engineers, and see the tangible impact of your work as we scale to serve Fortune 500 customers. As a founding member of the Pune SRE team, you'll help shape the culture and technical standards of a team that will grow with the company.

* Ladders Estimates

Similar Jobs

Sr. Control System Engineer/Site Reliability Engineer (SRE)
$160K — $225K *
QuEra Computing, Inc.
Boston, MA 02115 (Suffolk County)
Reposted Today
Principal Systems Engineer
$130K — $180K *
Blacksmith Software Inc
New York, NY 10025 (New York County)
Today
Platform Engineer
$130K — $180K *
Moonshot
New York, NY 10025 (New York County)
Today
Senior Systems Engineer
$82K — $172K *
CACI International
Remote
Today
Senior Engineer II - Systems Design (Integration, Linux + Programming)
$70K — $205K *
Microchip Technology
Remote
Reposted Today
Requirement Manager - Technical Assurance
$134K — $241K *
Appcast
Newark, NJ 07104 (Essex County)
Today

Get Ready For Your
Next Interview

More Jobs at Blitzy

Senior Backend Engineer
$160K — $220K *
Cambridge, MA 02139 (Middlesex County)
Today
Enterprise Technology
In-Person
Product Marketing Manager
$160K — $180K *
Cambridge, MA 02139 (Middlesex County)
Today
Enterprise Technology
In-Person
Automation Engineer
$90K — $145K *
Cambridge, MA 02139 (Middlesex County)
Today
Information Technology
In-Person
Developer Support Engineer
$75K — $135K *
Cambridge, MA 02139 (Middlesex County)
Today
Technical Services
In-Person
DevOps Engineer
$85K — $180K *
Cambridge, MA 02139 (Middlesex County)
Today
Information Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
5 days ago
Data Center Operations Technician
$62K — $112K *
Amazon
Boardman, OR 97818 (Morrow County)
Reposted Today
Full Stack Software Developer
$69K — $158K *
TeleTech
Norfolk, VA 23503 (Norfolk City County)
Today
Senior Full-Stack WebApp Engineer
$120K — $150K *
Level
Bellevue, WA 98006 (King County)
Today
Software Engineer II
$123K — $165K *
The Walt Disney Company
Seattle, WA 98115 (King County)
Reposted Today

Find similar Senior Site Reliability Engineer jobs:

Nationwide Cambridge, MA

Senior Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Site Reliability Engineer jobs:

Get Ready For Your
Next Interview