Staff Site Reliability Engineer

Google • $207K — $301K *

San Jose, CA 95123In-Person

Information Technology

8 - 10 years of experience

4 days ago

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor's degree in Computer Science or a related technical field, or equivalent practical experience.
8 years of experience in building and developing infrastructure or distributed systems.
5 years of troubleshooting and debugging experience.
5 years of experience architecting production-quality Machine Learning (ML) systems.
5 years of programming experience in C, Go, or Python.

Responsibilities

Lead initiatives to reduce support costs through intelligent alerting and system design improvements.
Advance the SRE team from on-call incident responders to proactive system partners.
Establish trust and influence with key stakeholders for effective system scaling.
Identify and resolve pain points for the team, partners, and customers with balanced solutions.
Collaborate with critical customers to enhance the reliability of their user experiences.

Benefits

Opportunity to work on large-scale, fault-tolerant systems with Google Cloud.
Mentorship and support to encourage learning and professional development.
Collaborative and inclusive team environment that fosters diverse perspectives.
Focus on automating work to optimize existing systems.

Full Job Description

Minimum qualifications:

Bachelor's degree in Computer Science, a related technical field, or equivalent practical experience.
8 years of experience building and developing infrastructure or distributed systems.
5 years of experience in troubleshooting and debugging.
5 years of experience building and architecting production quality Machine Learning (ML) systems.
5 years of experience programming in C , Go, or Python.

Preferred qualifications:

Master's degree in Computer Science, or a related technical field.
Experience in Site Reliability Engineering.
Experience in troubleshooting and supporting applications like web services, data storage, databases, data pipelines, commerce engines, with Linux/Unix or other operating systems.

About the job

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance.

Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

In this role, you will drive the supportability and reliability of Woodshed and Napa, two key data intelligence systems underlying Google's AI push.

Individual pay is determined by factors including job-related skills, experience, and relevant education or training.

US: $207000 - $301000 (USD) 20% bonus target bonus equity benefits

Responsibilities

Lead the team in our top 2026 challenge, reducing the support cost of the products via correct provisioning intelligent alerting, and system design and deployment improvements.
Grow the Site Reliability Engineering (SRE) team from trained on-callers and incident responders to system partners.
Build trust with and influence over key stakeholders to drive successful scaling of the supportability of complex systems.
Identify problems and painpoints of the team, dev partner teams, and customers; and drive solutions balancing short term and long term needs.
Work with critical customers to give them the reliability they need for their key user journeys.

About Google

Google is a multinational technology company that specializes in Internet-related services and products. These include online advertising technologies, search engine, cloud computing, software, and hardware. Google was founded in 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University. The company has grown tremendously since then and has become one of the most valuable companies in the world. Google's mission is to organize the world's information and make it universally accessible and useful.

Learn more about Google

Size

156,500 employees

Market Cap

$1,115.4 billion

Industry

Enterprise Technology

Net Income

$40.2 billion

Founded

1998

5 Year Trend

+23.3%

Revenue

$182.5 billion

NASDAQ

GOOGL

* Ladders Estimates

Similar Jobs

Sr Staff DevOps Engineer
$197K — $278K *
42dot, Inc
Sunnyvale, CA 94087 (Santa Clara County)
2 weeks ago
Staff Software Engineer, Backend (Continuous Integration)
$200K — $275K *
Affirm
Remote
2 weeks ago
Staff Site Reliability Engineer (SRE) | Dev Ops Engineer
$169K — $224K *
Grail
Menlo Park, CA 94025 (San Mateo County)
1 month ago

Get Ready For Your
Next Interview

More Jobs at Google

Research Scientist, Frontier Safety Loss of Control, DeepMind
$174K — $253K *
San Francisco, CA 94112 (San Francisco County)
Today
Consumer Technology
In-Person
Senior Software Engineering Manager, Agentic Policies Platform
$262K — $365K *
Sunnyvale, CA 94087 (Santa Clara County)
Today
Information Technology
In-Person
Staff Software Engineer, Capsium Policies, AI, and Compliance
$207K — $301K *
Sunnyvale, CA 94087 (Santa Clara County)
Today
Information Technology
In-Person
Global Data and AI Strategic Partner Development Manager
$224K — $312K *
Sunnyvale, CA 94087 (Santa Clara County)
Today
Enterprise Technology
In-Person
Logistics Manager, Google Cloud
$111K — $160K *
Kansas City, MO 64118 (Clay County)
Today
Transportation
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Data Modeler
$90K — $120K *
Electric Boats
Groton, CT 06340 (Southeastern Ct County)
Today
IT Project Manager - Facilities
$90K — $120K *
Electric Boats
Groton, CT 06340 (Southeastern Ct County)
Reposted Today
IT Security Specialist
$90K — $97K *
General Dynamics
Scottsdale, AZ 85254 (Maricopa County)
Today
Staff Software Engineer - Digital Payments, Austin, Tx.
$120K — $150K *
H-E-B
Austin, TX 78745 (Travis County)
Reposted Today

Find similar Staff Site Reliability Engineer jobs:

Nationwide San Jose, CA

Staff Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff Site Reliability Engineer jobs:

Get Ready For Your
Next Interview