Senior Site Reliability Engineer

Okta • $140K — $180K *

San Francisco, CA 94112In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of experience in Site Reliability Engineering or related fields.
Strong operational knowledge of AWS or GCP production environments.
Deep expertise with Kubernetes, Terraform, and Helm in live systems.
Proficiency in programming with Go and/or Python.
Experience with large-scale distributed databases and observability platforms.

Responsibilities

Design and operate cloud infrastructure and highly available services.
Lead incident response and drive post-incident improvements.
Collaborate with engineering teams to enhance system reliability and performance.
Develop automation solutions using Go, Python, and Terraform.
Improve deployment workflows through CI/CD and GitOps practices.

Benefits

Comprehensive well-being support programs.
Opportunities for social impact involvement.
Programs fostering talent development and community connection.

Full Job Description

The Engineering Opportunity

We are looking for an experienced Senior Site Reliability Engineer to join Okta's Emerging Products Group (EPG). Our mission is to build highly reliable, scalable, and secure cloud services that our customers can trust. We embrace an automation-first mindset and continuously invest in platform engineering, observability, and operational excellence to enable our engineering teams to move quickly and safely.

This role is ideal for an experienced Site Reliability Engineer who enjoys solving complex technical challenges at scale, building automation, and improving the reliability of production systems. You will serve as a key contributor within the EPG SRE organization, partnering closely with software engineers, architects, and product teams to design, build, and operate world-class cloud services.

The ideal candidate exemplifies the philosophy of "if you have to do it more than once, automate it" and possesses a strong passion for continuous improvement, operational excellence, and software engineering.

What You'll Be Doing
Reliability & Operations

Design, build, and operate large-scale cloud infrastructure and production services.
Participate in an on-call rotation supporting highly available customer-facing systems.
Lead incident response efforts and drive post-incident reviews focused on systemic improvements.
Define, measure, and improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.
Partner with engineering teams to improve service availability, scalability, performance, and resilience.
Continuously improve observability through metrics, logging, tracing, dashboards, and alerting.

Engineering & Automation

Develop software, automation, and infrastructure using Go, Python, Terraform, and related technologies.
Eliminate operational toil through automation, tooling, and platform engineering.
Improve deployment safety and operational workflows through CI/CD and GitOps practices.
Collaborate on modernizing existing workloads and aligning them with evolving platform capabilities.
Build self-service platforms, operational guardrails, and automation that improve developer velocity while maintaining reliability and security.

Technical Leadership

Contribute to and drive reliability initiatives within the product group.
Guide engineers in adopting operational best practices and reliability engineering principles.
Mentor engineers through technical collaboration, design reviews, incident analysis, and knowledge sharing.
Support architecture and operational decisions through data-driven recommendations and engineering expertise.
Execute projects from conception through production rollout and long-term operational ownership.

Innovation

Explore and apply AI-assisted engineering techniques to improve operational efficiency, incident response, troubleshooting, and automation.
Identify opportunities to leverage emerging technologies to reduce toil and improve engineering productivity.

Our Tech Stack

Infrastructure/Orchestration: Kubernetes (EKS/GKE), Terraform, Helm, Git, ArgoCD, GitOps
Programming: Golang, Python
Observability: Datadog, Splunk
Data Stores: PostgreSQL, Redis, OpenSearch

What We Are Looking For
Technical Excellence

Strong experience operating large-scale production services in AWS and/or GCP.
Deep expertise with Kubernetes in production environments.
Experience troubleshooting Kubernetes networking, storage, scheduling, scaling, and workload lifecycle issues.
Extensive experience with Infrastructure as Code technologies such as Terraform and Helm.
Strong software engineering skills in Golang and/or Python.
Experience building automation and internal engineering platforms.
Experience operating and troubleshooting distributed data platforms such as PostgreSQL, Redis, OpenSearch, MySQL, Cassandra, or similar technologies.
Strong understanding of cloud networking fundamentals including DNS, load balancing, ingress, TLS, service networking, and traffic management.
Experience with observability platforms, monitoring strategies, and production telemetry.
Experience with or strong interest in AI-assisted engineering and operational automation.

Operational Excellence

Strong expertise operating customer-facing production systems.
Experience leading incident response and driving operational improvements.
Deep understanding of reliability engineering concepts including SLIs, SLOs, error budgets, and capacity planning.
Strong understanding of CI/CD pipelines, deployment strategies, and automation-first operational practices.
Proven ability to balance reliability, scalability, security, and engineering velocity.

Security & Compliance

Understanding of cloud security fundamentals, IAM, secrets management, and secure infrastructure design.
Experience implementing operational controls and best practices in regulated or security-sensitive environments is a plus.

Leadership

Demonstrated experience contributing to complex engineering initiatives.
Strong collaboration and communication skills.
Experience working effectively within globally distributed engineering organizations spanning multiple timezones and cultures.
Experience mentoring engineers and elevating technical capabilities within an organization.
Ability to collaborate on technical direction through expertise, partnership, and execution.

Preferred Qualifications

Experience operating SaaS platforms serving large-scale customer workloads.
Experience working within Kubernetes-based microservices environments.
Experience supporting globally distributed production environments.
Experience with GitOps and ArgoCD.
Experience implementing AI-assisted operational tooling or automation workflows.

#LI-Hybrid
#P22403

The Okta Experience

Supporting Your Well-Being
Driving Social Impact
Developing Talent and Fostering Connection + Community

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

About Okta

Okta is a leading provider of identity and access management solutions for enterprises. The company's cloud-based platform enables organizations to securely connect people and technology, providing secure access to applications and data from any device, anywhere, at any time. Okta's solutions are used by thousands of organizations worldwide, including many Fortune 500 companies. The company was founded in 2009 and is headquartered in San Francisco, California. Okta is committed to providing innovative solutions that help organizations stay secure and productive in today's digital world.

Learn more about Okta

Size

5,342 employees

Market Cap

$10.5 billion

Industry

Enterprise Technology

Net Income

-$266.3 million

Founded

2009

5 Year Trend

+51.9%

Revenue

$835.4 million

NASDAQ

OKTA

* Ladders Estimates

Similar Jobs

Senior Principal Systems Engineer
$160K — $200K *
SAIC
Santa Maria, CA 93458 (Santa Barbara County)
Today
Software Engineer, Systems
$150K — $200K *
Meta
Menlo Park, CA 94025 (San Mateo County)
Today
Senior Platform Engineer
$148K — $201K *
Defense Unicorns
Remote
Today
Sr. Technologies Engineer
$111K — $167K *
Ensemble Health Partners
Remote
Reposted Today
Systems Administrator/Engineer
$90K — $189K *
CACI International
Remote
Today
Systems Engineer Senior
$89K — $157K *
Lockheed Martin
Sunnyvale, CA 94087 (Santa Clara County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Okta

Senior Site Reliability Engineer
$140K — $180K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Demo Engineer
$160K — $220K *
Washington, DC 20011 (District Of Columbia County)
Today
Information Technology
In-Person
Demo Engineer
$179K — $246K *
San Francisco, CA 94112 (San Francisco County)
Reposted Today
Enterprise Technology
In-Person
Staff Software Engineer, AI-Core (Federal)
$194K — $267K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Customer First GEO Content Specialist
$73K — $100K *
Toronto, ON M3C 0E3
Today
Information Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
2 weeks ago
Senior ServiceNow Engineer
$110K — $140K *
Comcast
Reston, VA 20191 (Fairfax County)
Today
Senior Data Engineer
$135K — $205K *
Ardent Eagle Solutions
Arlington, VA 22204 (Arlington County)
Today
Senior Manager - Salesforce
$136K — $230K *
MiniMed
Johns Creek, GA 30022 (Fulton County)
Today
Senior Manager, Cybersecurity
$147K — $170K *
Leprino Foods
Denver, CO 80219 (Denver County)
Today

Find similar Senior Site Reliability Engineer jobs:

Nationwide San Francisco, CA

Senior Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Site Reliability Engineer jobs:

Get Ready For Your
Next Interview