Senior Staff Site Reliability Engineer

HiveWatch

$183K — $235K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 7+ years of software engineering experience in production environments
  • 5+ years of SRE, DevOps, or production operations experience
  • Expertise in cloud platforms, preferably AWS, and containerized applications
  • Experience with Infrastructure as Code tools like Terraform
  • Proficiency in at least one language from the tech stack such as Java, Kotlin, or Python
  • Hands-on experience with relational databases and SQL optimization
  • Experience with monitoring and observability tools like Prometheus or Grafana

Responsibilities

  • Own the reliability of mission-critical systems including monitoring and capacity planning
  • Debug and resolve complex issues across infrastructure and application code
  • Participate in a 24/7 on-call rotation for critical system coverage
  • Conduct root cause analysis and implement preventive measures
  • Build automation and tooling to enhance system reliability
  • Maintain CI/CD pipelines and optimize database performance
  • Foster engineering excellence through technical leadership and mentorship

Benefits

  • Comprehensive health coverage: medical, dental, vision, and life insurance
  • Competitive compensation packages
  • 401(k) with a 4% company match
  • Flexible paid time off for work-life balance
  • Cutting-edge work in an emerging field with growth potential
  • Family-friendly culture that values balance and belonging
  • Additional benefits such as ClassPass credits and discounts on pet insurance
Full Job Description
POSITION OVERVIEW:

HiveWatch is seeking a Senior Staff Site Reliability Engineer to join our Platform Team, where you'll architect and maintain mission-critical edge infrastructure that connects our SaaS platform to customer systems. You'll ensure exceptional performance, reliability, and observability across our distributed environment while providing technical leadership to our growing engineering team. This role reports directly to our VP of Engineering.

WHAT YOU'LL DO:
  • Own the reliability of mission-critical systems including production monitoring, alerting, and capacity planning
  • Debug and resolve complex production issues across the full stack, from infrastructure to application code
  • Participate in a regular on-call rotation to provide 24/7 coverage for critical systems
  • Perform root cause analysis requiring deep code-level investigation and implement preventive measures
  • Build automation and tooling to reduce operational toil and improve system reliability
  • Maintain CI/CD pipelines, observability infrastructure, and database performance optimization
  • Increase the resiliency, scalability, and maintainability of production environments
  • Maintain on-call procedures and disaster recovery processes
  • Provide technical leadership and mentorship to foster engineering excellence and reliability culture

OUR TECH STACK:
  • Languages: Kotlin, Rust, TypeScript, and Python
  • Deployments: GitHub Actions, Terraform, Terragrunt, and Helm
  • Infrastructure: AWS (Kinesis, Serverless, RDS, EKS), Kubernetes, Docker, Postgres, IoT Edge, Red Hat Enterprise Linux, Rocky Linux

MINIMUM QUALIFICATIONS:
  • 7+ years of software engineering experience with strong coding skills in production environments
  • 5+ years of SRE, DevOps, or production operations experience
  • Expertise with cloud platforms (AWS preferred) and containerized applications (Docker, Kubernetes)
  • Experience with Infrastructure as Code (Terraform, CloudFormation, or similar)
  • Proficiency in at least one object oriented programming language in our tech stack (Java, Kotlin, Python)
  • Hands-on experience with relational databases and SQL performance optimization
  • Experience with monitoring and observability tools (Prometheus, Grafana, DataDog, or equivalent)
  • Strong debugging skills across distributed systems and microservices architectures
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience

PREFERRED QUALIFICATIONS:
  • Expertise in AWS architecture and services
  • Experience in physical security, IoT, or edge computing environments
  • Expertise with advanced AWS services (Kinesis, Lambda, EKS, RDS)
  • Experience with Terraform and Terragrunt specifically
  • Background in high-availability, multi-tenant SaaS environments
  • Experience establishing SRE practices and culture from the ground up
  • Track record of leading incident response and post-mortem processes
  • Experience mentoring and developing junior engineers
  • Knowledge of security best practices and compliance requirements
  • Experience with edge computing and distributed system architectures
  • Previous experience in a startup or high-growth environment (50-200 employees)
  • Experience with our tech stack: Kotlin, Rust,TypeScript, Python

ADDITIONAL INFO:
  • Salary range for this position: $183,000 to $235,000 per year
  • Eligible to participate in HiveWatch Equity Incentive Plan

*Final offer will be at the company's sole discretion and determined by multiple factors, including years and depth of relevant experience and expertise, location, and other business considerations.

Benefits & Culture:

At HiveWatch, we're passionate about taking care of our people - and it shows in the benefits we offer. Our team enjoys:
  • Comprehensive health coverage: medical, dental, vision, and life insurance
  • Cutting-edge work in an emerging field with huge growth potential
  • Competitive compensation packages designed to reward top talent
  • A modern, newly renovated HQ right on Main Street in El Segundo, CA
  • 401(k) with a 4% company match to help you invest in your future
  • Flexible paid time off so you can recharge when you need it
  • Additional benefits include ClassPass credits and a discount on pet insurance
  • A family-friendly, compassionate culture that values balance and belonging

We encourage you to challenge the status quo, share your perspective, and leave fear at the (access-controlled) door.

Similar Jobs

More Jobs at HiveWatch

More Information Technology Jobs

Find similar Senior Staff Site Reliability Engineer jobs: