Site Reliability Engineer

Specter

• $120K — $160K *

San Francisco, CA 94112In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Strong Linux systems administration experience, especially in production environments.
Experience with edge hardware and cloud infrastructure.
Solid understanding of networking fundamentals including DNS, firewalls, and VPNs.
Proficiency in scripting or programming languages like Python, Go, or Bash for operational tasks.
Familiarity with containerization technologies such as Docker and Kubernetes.
Experience with embedded systems, including reading firmware logs and understanding hardware-software interactions.
Deeper experience with AWS infrastructure and observability tools is a significant advantage.
Knowledge of Rust or C to assist in firmware-related tasks.

Responsibilities

Debug and triage issues across a diverse fleet of Linux-based sensor nodes.
SSH into field hardware for diagnosis and recovery under limited conditions.
Own complete site bring-ups to ensure systems are operational after outages.
Build and maintain fleet management systems including OTA update pipelines and device health tracking.
Identify and eliminate recurring issues by building necessary tooling and processes.
Automate repetitive tasks through scripting to improve efficiency.
Design and implement observability tools for logging, metrics, and alerting across edge devices and cloud infrastructure.

Benefits

Ownership of operational health of a cutting-edge sensor platform.
High-impact role at the intersection of operations and engineering.
Opportunities for collaboration with cross-functional teams during development.
Access to the latest technologies in edge computing and cloud infrastructure.

Full Job Description

The RoleWe're hiring a Site Reliability Engineer to own the operational health of our connected sensor platform - spanning a live fleet of edge hardware deployed at customer sites and the cloud infrastructure behind it.

This is a high-ownership role at the intersection of ops and platform engineering. You'll drive reliability across our sensor fleet - triaging issues in the field, building the systems that prevent them from recurring, and owning the observability that keeps us ahead of problems as we scale.

You set your own priorities across all three:
Responsibilities:
Reactive - Triage & Recovery

Debug and triage issues across a live fleet of diverse Linux-based sensor nodes and edge appliances deployed at customer sites.
SSH into field hardware to diagnose, patch, and recover systems - often with limited remote access and incomplete information.
Own site bring-ups end to end; be the person who gets things back online.

Systems Builder - Close the Loop

Build and maintain fleet management systems: OTA update pipelines, device health tracking, remote diagnostics, and lifecycle tooling.
Identify repeat fires and eliminate them - build tooling, pre-deployment checks, and root cause processes that prevent recurrence.
Automate toil relentlessly: if you're doing something twice, you should be scripting it.
Collaborate with embedded systems, and platform teams to define reliability and deployment requirements.

Observability Owner - Fleet Visibility

Design and implement observability (logging, metrics, alerting) across edge devices and cloud infrastructure (AWS).
Surface and close telemetry gaps; build fleet-wide visibility that enables data-driven reliability decisions.
Develop runbooks, incident response procedures, and participate in on-call rotations.

Qualifications:

Strong Linux systems administration - comfortable working over SSH in production, not just dev environments.
Experience with edge or on-prem hardware alongside cloud infrastructure.
Solid networking fundamentals: DNS, firewalls, VPNs, subnets, secure remote access.
Scripting or programming in Python, Go, or Bash for operational tooling.
Familiarity with containerization (Docker, Kubernetes a plus).
Embedded systems experience - reading firmware logs, understanding hardware-software boundaries, and reasoning about what's happening below the OS is a meaningful edge in this role.
Deeper cloud experience (AWS infrastructure, IAM, networking, observability tooling) is a strong plus for owning the cloud side of the fleet.
Rust or C experience - we have firmware in both; being able to read and reason about low-level code accelerates triage significantly.

* Ladders Estimates

Similar Jobs

Staff Systems Engineer
$130K — $200K *
Commonwealth Fusion Systems
Milpitas, CA 95035 (Santa Clara County)
Today
Senior Site Reliability Engineer
$120K — $150K *
Ellucian
Remote
Reposted Today
Senior Client Platform Engineer, Windows
$159K — $215K *
Dropbox
Remote
Today
Systems Engineer IV - Systems
$148K — $233K *
Voyager Technologies, Inc.
Reno, NV 89502 (Washoe County)
Today
Staff Devops Engineer (Remote)
$100K — $140K *
The Athletic
Remote
Today
Senior Engineer - Integrated Plant Design (Remote Eligible, U.S)
$111K — $213K *
GE Vernova
Remote
Reposted Yesterday

Get Ready For Your
Next Interview

More Jobs at Specter

Data Operations Engineer
$100K — $140K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person

More Information Technology Jobs

SUPERVISORY INFORMATION TECHNOLOGY PROJECT MANAGER (APPSW/PLCYPLN)
$100K — $130K *
U.S. Marine Corps
Quantico, VA 22134 (Prince William County)
Today
Systems Administrator II
$81K — $91K *
Toole Design Group LLC
Bentonville, AR 72712 (Benton County)
Today
Data Analytics Analyst
$80K — $122K *
Salesforce
Chicago, IL 60629 (Cook County)
Reposted Today
IT Analyst 3
$85K — $110K *
AGC Group
Hillsboro, OR 97124 (Washington County)
Reposted Today
Sr. Database Admin / Engineer - Exadata
$100K — $130K *
Columbia Technology Partners
Annapolis, MD 21401 (Anne Arundel County)
Today

Find similar Site Reliability Engineer jobs:

Nationwide San Francisco, CA

Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Site Reliability Engineer jobs:

Get Ready For Your
Next Interview