Sr. Staff Software Engineer (Reliability)

Zscaler • $176K — $220K *

San Jose, CA 95123In-Person

Information Technology

8 - 10 years of experience

4 days ago

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

BS or MS in Computer Science or related technical field with 10+ years in hyperscale systems.
Mastery of backend languages (Go, Java, Python) and high standards for code quality.
Experience with complex distributed systems, focusing on concurrency and failure handling.
Expertise in building REST APIs with strong idempotency and rollout guarantees.
Experience in hybrid infrastructure (AWS/GCP, GKE) and CI/CD safety.

Responsibilities

Drive migration from legacy scripts to a Temporal-based orchestration platform.
Identify and solve systemic inefficiencies to enhance operational autonomy.
Build LLM and ML systems for intelligent triage and automated runbooks.
Develop framework-type services for automation-ready products.
Implement comprehensive metrics and logs for explainable fleet actions.

Benefits

Various health plans
Time off plans for vacation and sick time
Parental leave options
Retirement options
Education reimbursement
In-office perks, and more!

Full Job Description

Role

We are looking for a Sr. Staff Software Engineer to join our Service Platform Automation team. This role offers flexibility to work a hybrid schedule (three days a week onsite) in San Jose, CA, reporting to the VP of Engineering. In this high-ownership position, you will build and operate the orchestration and reliability automation that manages ZIA's fleet lifecycle at massive scale. You will initially focus on leading the architectural transformation of legacy scripts into a safe, deterministic, Temporal-based orchestration platform to achieve "one-touch" provisioning. As you scale the platform, you will expand the team's mission into AI SRE practices, applying software engineering to identify and solve systemic inefficiencies and build self-healing capabilities across our global fleet.

What you'll do (Role Expectations)

Drive the migration from legacy scripts to a Temporal-based platform, engineering replay-safe workflows with built-in retries, idempotency, and safe rollback designs for one-touch fleet operations
Identify and solve systemic inefficiencies across our global fleet, engineering technical solutions needed to make our operations more autonomous
Build systems that leverage LLMs and ML for intelligent triage, global signal correlation, and automated runbooks to eliminate manual toil
Develop framework-type services for feature teams, ensuring all new products are delivered "automation-ready" with reliability hooks built directly into the code
Ensure every fleet-wide action is fully explainable, replayable, and auditable by implementing comprehensive metrics, traces, and event logging

Who You Are (Success Profile)

You thrive in ambiguity. You're comfortable building the path as you walk it. You thrive in a dynamic environment, seeing ambiguity not as a hindrance, but as the raw material to build something meaningful.
You act like an owner. Your passion for the mission fuels your bias for action. You operate with integrity because you genuinely care about the outcome. True ownership involves leveraging dynamic range: the ability to navigate seamlessly between high-level strategy and hands-on execution.
You are a problem-solver. You love running towards challenges because you are laser-focused on finding the solution, knowing that solving the hard problems delivers the biggest impact.
You are a high-trust collaborator. You are ambitious for the team, not just yourself. You embrace our challenge culture by giving and receiving ongoing feedback-knowing that candor delivered with clarity and respect is the truest form of teamwork and the fastest way to earn trust.
You are a learner. You have a true growth mindset and are obsessed with your own development, actively seeking feedback to become a better partner and a stronger teammate. You love what you do and you do it with purpose.

What We're Looking for (Minimum Qualifications)

BS or MS in Computer Science or a related technical field with 10+ years of experience in hyperscale systems, with a deep understanding of the unique failure modes and technical hurdles that only emerge at massive scale
Mastery of backend systems languages (Go, Java, Python, or others) with a proven ability to set the bar for code quality, maintainability, and distributed system correctness
Experience designing and operating complex distributed systems, with a focus on solving systemic challenges in concurrency, failure handling, and performance optimization
Expertise in building automation using REST APIs and Swagger with strong guarantees for idempotency, verification, and safe rollout patterns
Expertise in engineering and operating hybrid infrastructure across cloud platforms (AWS/GCP, GKE) and on-premise environments, ensuring consistent container orchestration and CI/CD safety

What Will Make You Stand Out (Preferred Qualifications)

Experience building or operating AI-enabled developer/ops tooling with measurable improvements in triage speed and operational efficiency
Experience in testing orchestration systems, including determinism verification, fault injection, and chaos engineering
Proficiency in PostgreSQL, including SQL development and schema management, to power high-scale, stateful management-plane services

#LI-Hybrid #LI-YC2

Zscaler's salary ranges are benchmarked and are determined by role and level. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations and could be higher or lower based on a multitude of factors, including job-related skills, experience, and relevant education or training.

The base salary range listed for this full-time position excludes commission/ bonus/ equity (if applicable) + benefits.

Base Pay Range

$176,000-$220,000 USD

Our Benefits program is one of the most important ways we support our employees. Zscaler proudly offers comprehensive and inclusive benefits to meet the diverse needs of our employees and their families throughout their life stages, including:

Various health plans
Time off plans for vacation and sick time
Parental leave options
Retirement options
Education reimbursement
In-office perks, and more!

Learn more about Zscaler's Future of Work strategy, hybrid working model, and benefits here.

About Zscaler

Zscaler is a cloud-based information security company that provides Internet security, web security, firewalls, sandboxing, SSL inspection, antivirus, vulnerability management and granular control of user activity in cloud computing, mobile and Internet of things environments. The company is headquartered in San Jose, California, and has offices in Australia, India, Japan, Singapore, the United Kingdom, and the United States.

Learn more about Zscaler

Size

3,153 employees

Market Cap

$15.5 billion

Industry

Information Technology

Net Income

-$191.4 million

Founded

2008

5 Year Trend

+54.1%

Revenue

$536 million

NASDAQ

* Ladders Estimates

Similar Jobs

Hybrid Staff Software Mobile Engineer - Android
$199K — $243K *
Quicken Loans
Remote
Today
Staff Software Engineer, Capacity Engineering
$177K — $364K *
Pinterest
Remote
Today
Staff Software Engineer, Capacity Engineering
$177K — $364K *
Pinterest
San Francisco, CA 94112 (San Francisco County)
Today
Staff Software Engineer, Perception
$215K — $270K *
Saildrone
Alameda, CA 94501 (Alameda County)
Today
Staff SW Engineer, Connectivity Quality Automation Lead
$190K — $240K *
General Motors
Mountain View, CA 94040 (Santa Clara County)
Today
Staff Software Engineer, Deep Learning Acceleration
$189K — $274K *
Aurora Innovation
San Francisco, CA 94112 (San Francisco County)
Today

Get Ready For Your
Next Interview

More Jobs at Zscaler

Senior Technical Training Specialist
$105K — $150K *
San Jose, CA 95123 (Santa Clara County)
Today
Technical Services
In-Person
Staff Site Reliability Engineer
$119K — $170K *
San Jose, CA 95123 (Santa Clara County)
Reposted Today
Information Technology
Remote in United States
Staff Site Reliability Engineer
$119K — $170K *
Remote
Reposted Today
Information Technology
Remote in United States
Transformation Architect - Healthcare
$170K — $243K *
Remote
Today
Healthcare
Remote in Texas, US
Principal Product Specialist (Eastern Time)
$164K — $235K *
Remote
Reposted 2 days ago
Information Technology
Remote in United States

More Information Technology Jobs

Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
6 days ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Software Engineer II, Search & Data Infrastructure -Slack
$117K — $223K *
Salesforce
Washington, DC 20011 (District Of Columbia County)
Reposted Today
Software Engineer Lead
$55K — $158K *
The PNC Financial Services Group, Inc
Dallas, TX 75217 (Dallas County)
Reposted Today
Senior R&D Engineer-17637
$130K — $180K *
Synopsys Inc
Sunnyvale, CA 94087 (Santa Clara County)
Today

Find similar Sr. Staff Software Engineer (Reliability) jobs:

Nationwide San Jose, CA

Sr. Staff Software Engineer (Reliability)

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Sr. Staff Software Engineer (Reliability) jobs:

Get Ready For Your
Next Interview