Staff Site Reliability Engineer - Observability GCP

Okta • $194K — $267K *

San Francisco, CA 94112In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of experience with GCP, particularly in managing observability
Expertise in developing actionable dashboards using Splunk or Grafana
At least 3 years in an SRE, DevOps, or Systems Engineering role focusing on high-availability systems
Strong coding skills in Go and Python for automation and tool development
In-depth knowledge of Linux internals and container orchestration with Kubernetes/GKE
Analytical mindset for debugging complex system performance issues

Responsibilities

Design and build scalable observability infrastructure using Terraform
Optimize collection and processing of observability data for reliability and low latency
Participate in on-call rotations and lead post-incident reviews
Automate deployment and scaling of observability agents to reduce manual toil

Benefits

Health, dental, and vision insurance
401(k) retirement plan
Flexible spending account
Paid time off and parental leave
Immersive in-person onboarding experience to connect with the team and mission

Full Job Description

We are seeking a highly technical ObservabilitySite Reliability Engineer with a specialty in Google Cloud, to own and expand our Observability ecosystem into GCP. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code-utilizing Terraform and strong coding proficiency in Go, Python, or Ruby-to automate the deployment of agents and collectors across complex distributed systems.

Key Responsibilities

Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
GCP Observabilty Engineering: Optimize the collection, processing, and storage of Observabilty data to ensure high reliability and low latency of our Splunk and Grafana services
Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.

Required Skills & Experience (The Essentials)

GKE: Minimum 5+ Experience scaling and managing observability in a Google Cloud platform. Visualization: Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources.SRE Mindset: Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.

Programming Proficiency: Strong coding skills in Python, Go for building internal tools and automating workflows.
Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE).
Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.

Bonus Skills (The "Nice-to-Haves")

Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
Grafana Loki: Experience in migrating Splunk to Grafana Loki

Other Cloud Platforms: Experience managing observability native tools within AWS.

Additional requirements:

This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.

#LI-MM
#LI-Hybrid

P24517_3387022

Below is the annual base salary range for candidates located in San Francisco Bay Area. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: https://rewards.okta.com/us.

The annual base salary range for this position for candidates located in the San Francisco Bay area is between:

$194,000-$267,000 USD

The Okta Experience

Supporting Your Well-Being
Driving Social Impact
Developing Talent and Fostering Connection + Community

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

About Okta

Okta is a leading provider of identity and access management solutions for enterprises. The company's cloud-based platform enables organizations to securely connect people and technology, providing secure access to applications and data from any device, anywhere, at any time. Okta's solutions are used by thousands of organizations worldwide, including many Fortune 500 companies. The company was founded in 2009 and is headquartered in San Francisco, California. Okta is committed to providing innovative solutions that help organizations stay secure and productive in today's digital world.

Learn more about Okta

Size

5,342 employees

Market Cap

$10.5 billion

Industry

Enterprise Technology

Net Income

-$266.3 million

Founded

2009

5 Year Trend

+51.9%

Revenue

$835.4 million

NASDAQ

OKTA

* Ladders Estimates

Similar Jobs

Senior Site Reliability Engineer (SRE)
$190K — $240K *
Tradeweb
Remote
Today
Senior Platform Engineer (Cloud Workloads)
$172K — $320K *
Veeam Software
San Jose, CA 95123 (Santa Clara County)
Yesterday
Senior Engineer - Integrated Plant Design (Remote Eligible, U.S)
$111K — $213K *
GE Vernova
Remote
Yesterday
Senior Software Systems Engineer - Autonomy Behavior and Metrics
$179K — $268K *
Latitude AI
Palo Alto, CA 94303 (Santa Clara County)
Yesterday
Staff Site Reliability Engineer
$200K — $230K *
Domino Data Lab
Remote
Yesterday
Principal Systems Engineer - Cloud Services
$165K — $218K *
Stanford Health Care
Palo Alto, CA 94303 (Santa Clara County)
Yesterday

Get Ready For Your
Next Interview

More Jobs at Okta

Staff Site Reliability Engineer - Observability GCP
$194K — $267K *
Bellevue, WA 98006 (King County)
Today
Information Technology
In-Person
Staff Site Reliability Engineer - Observability GCP
$194K — $267K *
New York, NY 10025 (New York County)
Today
Information Technology
In-Person
Staff Site Reliability Engineer - Observability GCP
$194K — $267K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Technical Architect
$120K — $150K *
Washington, DC 20011 (District Of Columbia County)
Today
Information Technology
In-Person
Technical Architect
$120K — $150K *
Fort Washington, MD 20744 (Prince Georges County)
Today
Technical Services
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
2 days ago
Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
2 weeks ago
Business Analyst
$70K — $95K *
RWS
Columbia, SC 29223 (Richland County)
Today
Cloud and Automation Systems Engineer-Remote
$90K — $120K *
RPM Living
Remote
Today
Senior Engineer, Customer Web and Mobile Experience
$100K — $130K *
Independence Pet Group
Chicago, IL 60629 (Cook County)
Today

Find similar Staff Site Reliability Engineer - Observability GCP jobs:

Nationwide San Francisco, CA

Staff Site Reliability Engineer - Observability GCP

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff Site Reliability Engineer - Observability GCP jobs:

Get Ready For Your
Next Interview