Tech Lead Site Reliability Engineer, Cloud Reliability Intelligence

Google • $207K — $301K *

Sunnyvale, CA 94087In-Person

Information Technology

8 - 10 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor's degree in Computer Science or related field or equivalent experience.
8 years of experience with data structures and algorithms.
3 years of leading projects and troubleshooting distributed systems.
3 years of technical leadership overseeing projects.
Experience with full-stack architectures, linking backend data automation to frontend engineering.

Responsibilities

Own the technical roadmap and architecture for the Evergreen platform.
Design and scale high-performance backend pipelines and data-rich user interfaces.
Prototype and implement LLM-based features for incident data processing.
Collaborate with Product Management and Data Science to align policy measurement and enforcement.

Benefits

Supportive environment with mentorship opportunities.
Collaborative culture that encourages innovation and risk-taking.
Focus on intellectual curiosity and problem-solving.
Opportunities for self-directed work on meaningful projects.

Full Job Description

Minimum qualifications:

Bachelor's degree in Computer Science or a related technical field or equivalent practical experience.
8 years of experience with data structures and algorithms.
3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems.
3 years of experience in a technical leadership role; overseeing projects.
Experience overseeing full-stack architectures, ensuring cohesion between backend data automation layers and engineering frontend.

Preferred qualifications:

Experience in applying LLMs or Generative AI to automate workflows.
Experience designing and scaling high-performance backend pipelines (Go, Java) and data-rich user interfaces (TypeScript, Angular).
Familiarity with large-scale reliability analysis, or policy conformance frameworks.

About the job
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

The Reliability Outcome Enablement team develops the products, core infrastructure, and datasets that drive and sustain Google Cloud platform's (GCP's) reliability promises. We build the evergreen intelligence platform the core system that automates resilience across the GCP ecosystem. Every product team at Google (from BigQuery to Spanner) relies on our infrastructure and integrated data lake to keep their services bulletproof.

We are currently expanding our platform to integrate Generative AI and LLM-driven workflows, moving from reactive tracking to a predictive system that catches failures and automates risk mitigation.

Individual pay is determined by factors including job-related skills, experience, and relevant education or training.

US: $207000 - $301000 (USD) 20% bonus target equity benefits

Learn more about benefits at Google .

Responsibilities

Own the technical roadmap and long-term architecture for the Evergreen platform, including a unified data model for promise delivery across GCP.
Design and scale high-performance backend pipelines (Go, Java) and data-rich user interfaces (TypeScript, Angular) used by over 10,000 Google engineers.
Prototype and productionize LLM-based features to parse unstructured incident data, automatically file risk tickets, and suggest reliability fixes.
Partner closely with Product Management, Data Science, and leadership to align multiple organizations on a unified approach to policy measurement and enforcement.

About Google

Google is a multinational technology company that specializes in Internet-related services and products. These include online advertising technologies, search engine, cloud computing, software, and hardware. Google was founded in 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University. The company has grown tremendously since then and has become one of the most valuable companies in the world. Google's mission is to organize the world's information and make it universally accessible and useful.

Learn more about Google

Size

156,500 employees

Market Cap

$1,115.4 billion

Industry

Enterprise Technology

Net Income

$40.2 billion

Founded

1998

5 Year Trend

+23.3%

Revenue

$182.5 billion

NASDAQ

GOOGL

* Ladders Estimates

Similar Jobs

Vulnerability Research Engineer (TS/SCI)
$148K — $266K *
Appcast
Remote
Yesterday
Sr. Lead Software Engineer, Full Stack - Shopping Tech
$209K — $238K *
Capital One Financial Corporation
Remote
Yesterday
Lead Software Engineer, Full Stack (Cyber)
$215K — $245K *
Capital One Financial Corporation
San Jose, CA 95123 (Santa Clara County)
Yesterday
M365 Power Platform Architect
$131K — $237K *
Leidos Holding
Remote
2 days ago
Team Lead, Mobile App, Ads, Serving, AI Focused
$174K — $253K *
Google
Mountain View, CA 94040 (Santa Clara County)
2 days ago
Lead Software Engineer, Back End (Python, Spark)
$215K — $245K *
Capital One Financial Corporation
San Francisco, CA 94112 (San Francisco County)
2 days ago

Get Ready For Your
Next Interview

More Jobs at Google

Senior Staff UX Researcher, Google Labs
$236K — $330K *
Mountain View, CA 94040 (Santa Clara County)
Today
Consumer Technology
In-Person
Staff Software Engineer, Google Cloud Compute
$207K — $301K *
Sunnyvale, CA 94087 (Santa Clara County)
Today
Information Technology
In-Person
Customer Engineer III, Platform, Google Cloud
$152K — $222K *
Cambridge, MA 02139 (Middlesex County)
Today
Information Technology
In-Person
Field Sales Representative II, ISV
$97K — $142K *
New York, NY 10025 (New York County)
Today
Enterprise Technology
In-Person
Program Manager, Data Center Infrastructure Risk
$159K — $231K *
Atlanta, GA 30349 (Fulton County)
Today
Information Technology
In-Person

More Information Technology Jobs

Key Access Control Analyst
$70K — $95K *
Cymertek
San Antonio, TX 78228 (Bexar County)
Today
Forensics Scripting Developer
$90K — $120K *
Cymertek
Annapolis Junction, MD 20701 (Howard County)
Today
Threat Analyst
$75K — $95K *
Cymertek
Annapolis, MD 21401 (Anne Arundel County)
Today
Linux / Unix Systems Administrator
$80K — $110K *
Cymertek
Aurora, CO 80013 (Arapahoe County)
Today
Network Security Analyst
$75K — $95K *
Cymertek
Honolulu, HI 96817 (Honolulu County)
Today

Find similar Tech Lead Site Reliability Engineer, Cloud Reliability Intelligence jobs:

Nationwide Sunnyvale, CA

Tech Lead Site Reliability Engineer, Cloud Reliability Intelligence

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Tech Lead Site Reliability Engineer, Cloud Reliability Intelligence jobs:

Get Ready For Your
Next Interview