Senior Staff Site Reliability Engineer

HiveWatch

• $183K — $235K *

El Segundo, CA 90245In-Person

Information Technology

5 - 7 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

7+ years of software engineering experience in production environments
5+ years of SRE, DevOps, or production operations experience
Expertise in cloud platforms, preferably AWS, and containerized applications
Experience with Infrastructure as Code tools like Terraform
Proficiency in at least one language from the tech stack such as Java, Kotlin, or Python
Hands-on experience with relational databases and SQL optimization
Experience with monitoring and observability tools like Prometheus or Grafana

Responsibilities

Own the reliability of mission-critical systems including monitoring and capacity planning
Debug and resolve complex issues across infrastructure and application code
Participate in a 24/7 on-call rotation for critical system coverage
Conduct root cause analysis and implement preventive measures
Build automation and tooling to enhance system reliability
Maintain CI/CD pipelines and optimize database performance
Foster engineering excellence through technical leadership and mentorship

Benefits

Comprehensive health coverage: medical, dental, vision, and life insurance
Competitive compensation packages
401(k) with a 4% company match
Flexible paid time off for work-life balance
Cutting-edge work in an emerging field with growth potential
Family-friendly culture that values balance and belonging
Additional benefits such as ClassPass credits and discounts on pet insurance

Full Job Description

POSITION OVERVIEW:

HiveWatch is seeking a Senior Staff Site Reliability Engineer to join our Platform Team, where you'll architect and maintain mission-critical edge infrastructure that connects our SaaS platform to customer systems. You'll ensure exceptional performance, reliability, and observability across our distributed environment while providing technical leadership to our growing engineering team. This role reports directly to our VP of Engineering.

WHAT YOU'LL DO:

Own the reliability of mission-critical systems including production monitoring, alerting, and capacity planning
Debug and resolve complex production issues across the full stack, from infrastructure to application code
Participate in a regular on-call rotation to provide 24/7 coverage for critical systems
Perform root cause analysis requiring deep code-level investigation and implement preventive measures
Build automation and tooling to reduce operational toil and improve system reliability
Maintain CI/CD pipelines, observability infrastructure, and database performance optimization
Increase the resiliency, scalability, and maintainability of production environments
Maintain on-call procedures and disaster recovery processes
Provide technical leadership and mentorship to foster engineering excellence and reliability culture

OUR TECH STACK:

Languages: Kotlin, Rust, TypeScript, and Python
Deployments: GitHub Actions, Terraform, Terragrunt, and Helm
Infrastructure: AWS (Kinesis, Serverless, RDS, EKS), Kubernetes, Docker, Postgres, IoT Edge, Red Hat Enterprise Linux, Rocky Linux

MINIMUM QUALIFICATIONS:

7+ years of software engineering experience with strong coding skills in production environments
5+ years of SRE, DevOps, or production operations experience
Expertise with cloud platforms (AWS preferred) and containerized applications (Docker, Kubernetes)
Experience with Infrastructure as Code (Terraform, CloudFormation, or similar)
Proficiency in at least one object oriented programming language in our tech stack (Java, Kotlin, Python)
Hands-on experience with relational databases and SQL performance optimization
Experience with monitoring and observability tools (Prometheus, Grafana, DataDog, or equivalent)
Strong debugging skills across distributed systems and microservices architectures
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience

PREFERRED QUALIFICATIONS:

Expertise in AWS architecture and services
Experience in physical security, IoT, or edge computing environments
Expertise with advanced AWS services (Kinesis, Lambda, EKS, RDS)
Experience with Terraform and Terragrunt specifically
Background in high-availability, multi-tenant SaaS environments
Experience establishing SRE practices and culture from the ground up
Track record of leading incident response and post-mortem processes
Experience mentoring and developing junior engineers
Knowledge of security best practices and compliance requirements
Experience with edge computing and distributed system architectures
Previous experience in a startup or high-growth environment (50-200 employees)
Experience with our tech stack: Kotlin, Rust,TypeScript, Python

ADDITIONAL INFO:

Salary range for this position: $183,000 to $235,000 per year
Eligible to participate in HiveWatch Equity Incentive Plan

*Final offer will be at the company's sole discretion and determined by multiple factors, including years and depth of relevant experience and expertise, location, and other business considerations.

Benefits & Culture:

At HiveWatch, we're passionate about taking care of our people - and it shows in the benefits we offer. Our team enjoys:

Comprehensive health coverage: medical, dental, vision, and life insurance
Cutting-edge work in an emerging field with huge growth potential
Competitive compensation packages designed to reward top talent
A modern, newly renovated HQ right on Main Street in El Segundo, CA
401(k) with a 4% company match to help you invest in your future
Flexible paid time off so you can recharge when you need it
Additional benefits include ClassPass credits and a discount on pet insurance
A family-friendly, compassionate culture that values balance and belonging

We encourage you to challenge the status quo, share your perspective, and leave fear at the (access-controlled) door.

* Ladders Estimates

Similar Jobs

Software Development Engineer, Creator Marketplace
$143K — $194K *
Amazon
Culver City, CA 90230 (Los Angeles County)
Today
Senior Software Engineer - AI & Workflow Automation
$170K — $200K *
Deluxe Media, Inc.
Burbank, CA 91505 (Los Angeles County)
Reposted Today
Expert Engine Engineer - Treyarch - Los Angeles, CA
$124K — $229K *
Activision Blizzard, Inc.
Los Angeles, CA 90011 (Los Angeles County)
Reposted Today
Expert Engine Engineer - Treyarch - Los Angeles, CA
$124K — $229K *
Activision Blizzard, Inc.
Playa Vista, CA 90094 (Los Angeles County)
Reposted Today
Software Engineer, NVIDIA OpenShell
$184K — $356K *
NVIDIA Corporation
Remote
Reposted Today
Software Engineer, iOS Core Product - Fresno, CA, USA
$140K — $200K *
Speechify
Fresno, CA 93722 (Fresno County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at HiveWatch

Customer Success Director
$120K — $150K *
El Segundo, CA 90245 (Los Angeles County)
1 month ago
Business Services
In-Person
Senior Staff Site Reliability Engineer
$183K — $235K *
El Segundo, CA 90245 (Los Angeles County)
1 month ago
Information Technology
In-Person
Director of Engineering
$203K — $255K *
El Segundo, CA 90245 (Los Angeles County)
1 month ago
Enterprise Technology
In-Person

More Information Technology Jobs

Chief Executive Officer
The Mitalmor Group
San Francisco, CA 94102 (San Francisco County)
2 weeks ago
Network Administrator III (WAN)
$75K — $95K *
Abacus Technology
Montgomery, AL 36117 (Montgomery County)
Today
Linux Systems Engineer
$90K — $130K *
Abile Group, Inc.
Springfield, VA 22153 (Fairfax County)
Today
Machine Learning Engineer
$100K — $130K *
Abile Group, Inc.
Chantilly, VA 20152 (Loudoun County)
Today
PostgreSQL Database Architect
$100K — $130K *
Abile Group, Inc.
St. Louis, MO 63129 (Saint Louis County)
Today

Find similar Senior Staff Site Reliability Engineer jobs:

Nationwide El Segundo, CA

Senior Staff Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Staff Site Reliability Engineer jobs:

Get Ready For Your
Next Interview