Staff Site Reliability Engineer (SRE) | Dev Ops Engineer

Grail • $169K — $224K *

Menlo Park, CA 94025Hybrid

Healthcare

8 - 10 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

BS in Computer Science, Engineering, or related field, or equivalent experience
8+ years of experience in Site Reliability Engineering, DevOps, or platform engineering
Strong hands-on experience with at least one major cloud platform (AWS, GCP, or Azure)
Experience implementing infrastructure-as-code solutions (Terraform, CloudFormation, or similar)
Experience designing and operating CI/CD pipelines
Hands-on experience with Kubernetes and containerized systems in production environments
Proficiency in scripting or programming for automation (e.g., Python, Go, Bash, or PowerShell)
Strong understanding of networking, security, and distributed systems fundamentals

Responsibilities

Design, build, and operate highly available, fault-tolerant cloud infrastructure across AWS, GCP, and/or Azure
Architect and maintain scalable CI/CD pipelines and deployment frameworks for enterprise-grade software delivery
Lead infrastructure-as-code adoption and maturity using tools such as Terraform, CloudFormation, and Ansible
Own Kubernetes reliability across multi-cluster environments, including upgrades, scaling, and workload lifecycle management
Establish and evolve observability platforms (metrics, logs, traces) and define SLO/SLI frameworks across teams
Lead incident response for critical outages, drive root cause analysis, and implement preventative improvements
Optimize infrastructure for cost, performance, and scalability, partnering closely with engineering and finance stakeholders

Benefits

Flexible time-off or vacation
401(k) retirement plan with employer match
Medical, dental, and vision coverage
Carefully selected mindfulness programs

Full Job Description

GRAIL is seeking a Staff Site Reliability / DevOps Engineer to lead the reliability, scalability, and security of our cloud-native platform. This role operates at the intersection of infrastructure engineering, platform strategy, and organizational leadership, supporting systems that power large-scale data processing and cutting-edge cancer detection technologies.

You will define and drive infrastructure standards across teams, represent reliability and performance in architecture decisions, and build systems that scale well beyond your direct ownership. This is a highly technical, high-impact role combining hands-on engineering with cross-functional influence and mentorship.

Flexible - MPK or RTP (3 days in office)
This is a hybrid role based in either Menlo Park, CA (moving to Sunnyvale, CA in Fall 2026) or Durham, NC. Our current flexible work arrangement policy requires that a minimum of 60%, or 24 hours, of your total work week be on-site. Your specific schedule, determined in collaboration with your manager, will align with team and business needs and could exceed the 60% requirement for the site.

Reponsibilities

Design, build, and operate highly available, fault-tolerant cloud infrastructure across AWS, GCP, and/or Azure
Architect and maintain scalable CI/CD pipelines and deployment frameworks for enterprise-grade software delivery
Lead infrastructure-as-code adoption and maturity using tools such as Terraform, CloudFormation, and Ansible
Own Kubernetes reliability across multi-cluster environments, including upgrades, scaling, and workload lifecycle management
Establish and evolve observability platforms (metrics, logs, traces) and define SLO/SLI frameworks across teams
Lead incident response for critical outages, drive root cause analysis, and implement preventative improvements
Optimize infrastructure for cost, performance, and scalability, partnering closely with engineering and finance stakeholders
Define and enforce DevOps, reliability, and security best practices across the organization
Partner cross-functionally with engineering, data, QA, security, and IT teams to design resilient systems
Mentor engineers and contribute to technical leadership through design reviews, standards, and knowledge sharing

These responsibilities summarize the role's primary responsibilities and are not an exhaustive list. They may change at the company's discretion.
What Success Looks Like in Your First Year

Conduct a comprehensive assessment of the current infrastructure, drive infrastructure-as-code adoption to 95%+ across critical systems, and establish clear health and reliability baselines for the Kubernetes platform
Standardize observability using modern tooling and implement an SLO/SLI framework adopted across multiple product teams, including defined SLAs for critical data systems
Strengthen security and compliance posture across cloud environments by implementing consistent baselines, launching a compliance-as-code framework, and reducing mean time to resolution (MTTR) for production incidents
Define, document, and drive adoption of engineering standards, best practices, and operational guidelines across platform and product teams
Develop and align stakeholders on a forward-looking platform reliability and infrastructure roadmap
Demonstrate measurable mentorship and technical leadership impact across the engineering organization
Evaluate and provide recommendations on emerging infrastructure needs, including support for AI/ML and advanced data workloads

Required Qualifications

BS in Computer Science, Engineering, or related field, or equivalent experience
8+ years of experience in Site Reliability Engineering, DevOps, or platform engineering
Strong hands-on experience with at least one major cloud platform (AWS, GCP, or Azure)
Experience implementing infrastructure-as-code solutions (Terraform, CloudFormation, or similar)
Experience designing and operating CI/CD pipelines (e.g., GitLab CI, GitHub Actions, Jenkins)
Hands-on experience with Kubernetes and containerized systems in production environments
Proficiency in scripting or programming for automation (e.g., Python, Go, Bash, or PowerShell)
Experience with observability and monitoring tools (e.g., Prometheus, Grafana, OpenTelemetry, Datadog)
Strong understanding of networking, security, and distributed systems fundamentals
Experience working in regulated environments and familiarity with frameworks such as ISO 27001, NIST, SOC 2, or HIPAA

Preferred Qualifications

10+ years of experience in SRE, DevOps, or infrastructure engineering
Experience operating multi-cluster Kubernetes environments (e.g., EKS, GKE) at scale
Familiarity with GitOps practices (e.g., ArgoCD, Flux)
Experience with data platforms and pipelines (e.g., Kafka, Airflow, Spark, Snowflake, BigQuery)
Experience implementing SLO/SLI frameworks and reliability practices across multiple teams
Strong background in cloud security, including IAM, zero-trust architecture, and secrets management
Experience with compliance-as-code and security tooling (e.g., OPA, Snyk, Checkov)
Exposure to AI/ML or large-scale data infrastructure workloads
Experience in healthcare, biotech, or other regulated industries
Relevant cloud or Kubernetes certifications (e.g., AWS DevOps, CKA/CKS, GCP DevOps)

Physical Demands and Working Environment

Standard office environment with hybrid flexibility
Participation in on-call rotation and after-hours support for critical systems may be required
Frequent collaboration with cross-functional and senior stakeholders
Fast-paced, dynamic environment with emphasis on reliability, scalability, and innovation

Adaptability and Growth Expectation

As the organization evolves, responsibilities may expand or shift to meet business needs. This may include:

Taking on additional technical or leadership responsibilities
Participating in cross-functional initiatives and strategic projects
Adapting to new technologies, tools, and methodologies
Supporting other teams during periods of high demand

The expected, full-time, annual base pay scale for this position is $169K - $224K for Durham, NC. Actual base pay will consider skills, experience, and location.

This role may be eligible for other forms of compensation, including an annual bonus and/or incentives, subject to the terms of the applicable plans and Company discretion. This range reflects a good-faith estimate of the range that the Company reasonably expects to pay for the position upon hire; the actual compensation offered may vary depending on factors such as the candidate's qualifications. Employees in this role are also eligible for GRAIL's comprehensive and competitive benefits package, offered in accordance with our applicable plans and policies. This package currently includes flexible time-off or vacation; a 401(k) retirement plan with employer match; medical, dental, and vision coverage; and carefully selected mindfulness programs.

About Grail

Grail is a healthcare company that develops and commercializes blood tests for early cancer detection. The company's tests use a combination of machine learning, genomics, and clinical data to detect cancer at an early stage, when it is most treatable. Grail was founded in 2016 and is headquartered in Redwood City, California.

Learn more about Grail

Size

500 employees

Industry

Pharmaceuticals & Biotech

Founded

2016

* Ladders Estimates

Similar Jobs

Senior Staff Production Engineer
$140K — $200K *
Zscaler
San Jose, CA 95123 (Santa Clara County)
4 days ago
Sr Staff DevOps Engineer
$197K — $278K *
42dot, Inc
Sunnyvale, CA 94087 (Santa Clara County)
4 days ago
Staff Software Engineer, Backend (Continuous Integration)
$200K — $275K *
Affirm
Remote
6 days ago
Senior Staff DevOps Engineer - CI/CD & Release Engineering
$198K — $260K *
Sonatus
Sunnyvale, CA 94087 (Santa Clara County)
1 month ago

Get Ready For Your
Next Interview

More Jobs at Grail

GRAIL Galleri Consultant 1 (Tampa Northwest) #4671
$94K — $125K *
Tampa, FL 33647 (Hillsborough County)
6 days ago
Healthcare
In-Person
Quality Engineer 2, Quality Operations - Clinical Laboratory # 4790
$82K — $102K *
Durham, NC 27713 (Durham County)
2 weeks ago
Pharmaceuticals & Biotech
Hybrid
Senior Supplier Quality Engineer # 4764, #4772
$94K — $118K *
Menlo Park, CA 94025 (San Mateo County)
3 weeks ago
Pharmaceuticals & Biotech
Hybrid
Senior Scientist, Biostatistics
$156K — $187K *
Menlo Park, CA 94025 (San Mateo County)
3 weeks ago
Pharmaceuticals & Biotech
Hybrid
Supply Chain Planning Manager #4797
$118K — $156K *
Durham, NC 27713 (Durham County)
3 weeks ago
Healthcare
Hybrid

More Healthcare Jobs

Chief Medical Officer Part Time
$210K + $210,000 annually. mpi offers free medical, dental, vision, pto, *
Motion Picture Industry Pension & health Plans
Studio City, CA 91604 (Los Angeles County)
2 days ago
Clinical Specialist - Radiology
$125K + $15K bonus + equity *
Confidential Company
Atlanta, GA 30303 (Fulton County)
3 days ago
CEO Psych / Behavioral Health Hospital
$275K — $375K *
Confidential Company
Falls Church, VA 22044 (Fairfax County)
2 weeks ago
Physician
$150K — $200K *
Playground Pediatrics
Laurinburg, NC 28352 (Scotland County)
Today
Primary Care Physician
$174K — $374K *
CVS Health
Phoenix, AZ 85032 (Maricopa County)
Reposted Today

Find similar Staff Site Reliability Engineer (SRE) | Dev Ops Engineer jobs:

Nationwide Menlo Park, CA

Staff Site Reliability Engineer (SRE) | Dev Ops Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff Site Reliability Engineer (SRE) | Dev Ops Engineer jobs:

Get Ready For Your
Next Interview