Staff Data Engineer - (Durham NC or Menlo Park, CA) - #4571
GRAIL's Research department is seeking a
Staff Data Engineer to lead the design, development, and evolution of data systems that power GRAIL's product pipeline, from sample collection through processing, analysis, and regulatory submission. This role operates at the intersection of computational science, engineering, and clinical research, enabling high-impact decision-making across the organization.
The
Staff Data Engineer is a technical leader who partners with scientists, statisticians, and engineering teams to shape system architecture and deliver robust, analysis-ready datasets. This individual operates with a high degree of autonomy, tackling complex and ambiguous challenges, and influencing cross-functional teams to align on data standards, best practices, and long-term solutions.
They will develop deep expertise in GRAIL's end-to-end data lifecycle, including EDC, LIMS, Bioinformatics Pipelines, and TidyData, an internally-developed system that aggregates and serves combined datasets. They will lead efforts to improve interoperability, scalability, and data quality across these systems.
The Staff Data Engineer will also collaborate with software engineers and scientists to develop dataset requirements, develop code and procedures to support dataset generation, perform QC, and troubleshoot issues that arise. As needed, the Staff Data Engineer will also contribute to new reporting, data visualization, and statistical analysis features.
Impact & Scope- Own and drive large, complex data initiatives that impact multiple teams and stages of the product pipeline
- Define and evolve data architecture, standards, and best practices across systems
- Influence technical direction and strategy for data engineering within Research and partner organizations
- Act as a subject matter expert and technical leader, guiding others and elevating team capabilities
- Solve ambiguous, high-impact problems requiring deep technical judgment and cross-domain understanding
This is a hybrid role based in either
Menlo Park, CA (moving to Sunnyvale, CA in Fall 2026) or
Durham, NC. Our current flexible work arrangement policy requires that a minimum of 60%, or 24 hours, of your total work week be on-site. Your specific schedule, determined in collaboration with your manager, will align with team and business needs and could exceed the 60% requirement for the site.
Responsibilities- Collaborate with data scientists, biostatisticians, and clinical teams to deliver data solutions and sample selections that support clinical trial and research analysis goals
- Translate complex scientific and analytical requirements into robust, reusable data solutions
- Contribute to data quality frameworks, including standards for validation, reconciliation, and observability across datasets
- Drive self-service data platform strategy, implementation and tooling, adoption through training and documentation
- Lead efforts to standardize and improve dataset generation, QC, and reporting workflows
- Evaluate and introduce new technologies, methodologies, and best practices to improve data management in a regulated biotechnology environment
- Mentor other engineers and contribute to technical leadership, standards, and best practices across the organization
- These responsibilities summarize the role's primary responsibilities and are not an exhaustive list. They may change at the company's discretion
These responsibilities summarize the role's primary responsibilities and are not an exhaustive list. They may change at the company's discretion
Required Qualifications- BS with 8+ years, MS with 5+ years, or PhD with 3+ years of experience in a computational or scientific field (life science, computer science, engineering, mathematics, statistics, bioinformatics, etc.)
- Advanced proficiency in Python or R, with strong software engineering fundamentals
- Demonstrated experience designing end-to-end data systems and architectures - from ingestion and transformation to orchestration and visualization
- Deep understanding of data modeling, pipelines, orchestration, and data quality practices
- Proven ability to lead complex, cross-functional projects with significant business or scientific impact
- Strong communication skills, with the ability to influence technical and non-technical stakeholders
- Experience operating with high autonomy in ambiguous problem spaces
Preferred Qualifications- Experience with distributed systems or system-level programming (Go, Java, C++)
- Familiarity with bioinformatics, clinical data systems, or molecular biology concepts
- Experience with cloud platforms (AWS) and modern data infrastructure
- Experience driving technical strategy, standards, or platform adoption
- Intermediate experience with AI-assisted development workflows
- Strong SQL and data warehousing expertise
Expected full time annual base pay scale for this position is $190-237k in Menlo Park, CA and $165k-206K in Durham NC. Actual base pay will consider skills, experience and location.
This role may be eligible for other forms of compensation, including an annual bonus and/or incentives, subject to the terms of the applicable plans and Company discretion. This range reflects a good-faith estimate of the range that the Company reasonably expects to pay for the position upon hire; the actual compensation offered may vary depending on factors such as the candidate's qualifications. Employees in this role are also eligible for GRAIL's comprehensive and competitive benefits package, offered in accordance with our applicable plans and policies. This package currently includes flexible time-off or vacation; a 401(k) retirement plan with employer match; medical, dental, and vision coverage; and carefully selected mindfulness programs.