Senior Site Reliability Engineer

David AI

• $130K — $180K *

San Francisco, CA 94112In-Person

Information Technology

5 - 7 years of experience

6 days ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years in Site Reliability, Infrastructure, or Platform Engineering for large-scale SaaS or cloud systems.
Hands-on experience with security best practices in production systems and cloud infrastructure.
Strong background in building reliable and scalable systems.
Experience with AWS, Terraform, containers (e.g., Kubernetes), and cloud networking basics.
Proficient in observability tooling (e.g., Prometheus, Grafana, Datadog).
Effective collaborator in fast-paced, cross-functional teams.
Bachelor's degree in Computer Science or related field, or equivalent practical experience.

Responsibilities

Own the observability stack, managing monitoring, alerting, logging, and tracing.
Partner with product and platform engineering teams to build resilient systems from inception.
Design and implement secure, scalable cloud infrastructure on AWS using Terraform.
Lead enhancements in CI/CD processes and incident response practices to boost efficiency.
Define and evolve SRE practices, influencing reliability culture and standards across the organization.

Benefits

Unlimited PTO.
Comprehensive health, dental, and vision coverage with 100% coverage for most plans.
FSA & HSA access.
401k access.
Meals twice daily through DoorDash and office snacks.
Unlimited company-sponsored Barry's classes.

Full Job Description

About this role

As a SeniorSite Reliability Engineer at David AI, you will shape and build the foundation for reliability, observability, and scalability across David AI's infrastructure. Working closely with our engineering and product teams, you'll help ensure our systems are resilient, efficient, and designed to scale as the company grows.

In this role, you will

Own David AI's observability stack, including monitoring, alerting, logging, and tracing, to provide engineers with clear visibility into system health, reliability, and performance.
Partner closely with product and platform engineering teams to design systems that are scalable, resilient, and reliable from day one, not as an afterthought.
Design and implement secure, scalable cloud infrastructure across AWS using Terraform and modern DevOps tooling to support rapid product and research iteration.
Lead improvements across deployment pipelines, CI/CD systems, and incident response processes to reduce downtime, improve operational efficiency, and strengthen engineering velocity.
Define and evolve the foundation of SRE practices at David AI, influencing reliability culture, tooling standards, operational excellence, and best practices across the engineering organization.

Your background looks like

5+ years of experience in Site Reliability, Infrastructure, or Platform Engineering supporting large-scale SaaS or cloud systems.
Hands-on experience applying Security best practices in production systems and cloud infrastructure.
Strong experience building and running reliable, highly available, and scalable systems.
Hands-on experience with AWS, Terraform, containers (like Kubernetes), and cloud networking basics.
Experience implementing and maintaining observability tooling across monitoring, logging, alerting, and tracing (e.g., Prometheus, Grafana, Datadog, or similar).
Comfortable working in fast-paced teams and collaborating closely with product, ML, and engineering teams.
Bachelor's degree in Computer Science or related field, or equivalent practical experience.

Bonus points if you have

Past experience in an early-stage startup environment, especially defining SRE culture and tooling from scratch.
Familiarity with incident management automation or self-healing infrastructure patterns.

Some technologies we work with

Next.js, TypeScript, TailwindCSS, Node.js, tRPC, PostgreSQL, AWS, Temporal, WebRTC, FFmpeg.

Benefits

Unlimited PTO.
Top-notch health, dental, and vision coverage with 100% coverage for most plans.
FSA & HSA access.
401k access.
Meals 2x daily through DoorDash + snacks and beverages available at the office.
Unlimited company-sponsored Barry's classes.

* Ladders Estimates

Similar Jobs

Staff Infrastructure Engineer
$120K — $150K *
Onebrief, Inc
Remote
Today
Systems Engineer - Mechanical, Product Development
$156K — $292K *
Revolution Space
Remote
Today
Principal Infrastructure Engineer
$120K — $160K *
Onebrief, Inc
Remote
Today
Explosives Research Experiment Lead
$146K — $222K *
LLNL
Livermore, CA 94550 (Alameda County)
Today
Ontology Systems Engineer
$157K — $174K *
General Dynamics
Remote
Today
System Development Engineer, AWS EC2 Nitro Team
$173K — $235K *
Amazon
Santa Clara, CA 95051 (Santa Clara County)
2 days ago

Get Ready For Your
Next Interview

More Jobs at David AI

Software Engineer, Platform
$120K — $160K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Software Engineer, Security
$120K — $160K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Senior Site Reliability Engineer
$130K — $180K *
San Francisco, CA 94112 (San Francisco County)
6 days ago
Information Technology
In-Person
Head of Deployment Strategy
$130K — $180K *
New York, NY 10025 (New York County)
1 month ago
Consumer Technology
In-Person
Head of Deployment Strategy
$130K — $180K *
San Francisco, CA 94112 (San Francisco County)
1 month ago
Enterprise Technology
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
2 days ago
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
1 week ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Senior Data Analyst
$81K — $148K *
Galvanize
Remote
Today
ServiceNow Technical Architect
$132K — $302K *
Accenture
Houston, TX 77084 (Harris County)
Today

Find similar Senior Site Reliability Engineer jobs:

Nationwide San Francisco, CA

Senior Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Site Reliability Engineer jobs:

Get Ready For Your
Next Interview