ABOUT OUR COMPANY
At Guardant Health our mission is to conquer cancer with data; as such software sits at the core of everything we do. While we’re best known for our molecular diagnostics, which unlock the genomic signatures of cancer, these tests are just the first step in the equation. Turning this genomic data into actionable insights shared with thousands of patients, physicians, and researchers in a scalable, reliable, and secure fashion is a software product challenge.
“We wanted flying cars and instead we got 140 characters” is a much-repeated complaint about Silicon Valley. But with all due respect to flying cars, we believe that our mission is even more critical. We’ve raised more than $500M from some of the leading venture firms in the world to work on this problem.
We are building a unique software stack to manage an ecosystem of microservices, RESTful APIs, and data integrations with internal and external systems to deliver useful and elegant user experiences in the extraordinarily complex oncology diagnostic and therapeutic landscape. We connect patients with clinical trials, help clinicians order our test and receive our clinical reports, and deliver valuable genomic datasets to researchers to help uncover important insights into treatment paradigms and drug discovery. Our technology stack reflects our views of using the best tools for the job, employing Java, Python, Ruby along with Kubernetes, Docker, Mule, MySQL, MongoDB, high-performance computing clusters (HPC), and a variety of AWS services to analyze and disseminate vast volumes of genomic data.
About the Role
The Data Engineering & Analytics team is seeking an experienced senior data engineer who has experience in building world class data processing and analytics platform. The team is responsible for anything and everything related to data across company with the mission to empower every business function (R&D, commercial, clinical etc.) to operate efficiently and make decision on curated and high quality data.
Great opportunity to build data engineering and analytics platform from scratch, contribute in tech stack decision making and work with interesting human genomics data in large scale while using the latest tools and technologies.
As a Senior Data Engineer, you will primarily drive
- Understanding the business requirements to design and build the strategy to create an enterprise data storage, processing and analytical systems
- Designing the appropriate data stores, architect & build data flows to integrate heterogeneous 3rd party data and in-house operational data sources and analytical data sources
- Develop data models for faster query access based on the business reporting/analytics requirements; standardize data interfaces and schemas for internal and external integrations
- Owning and administering various data stores and databases across the company
- Taking the scalability and data availability to the next level considering the massive amount of genomics data we generate and process
- Help define the best practices for data lineage, data modeling, data pipeline across the board
You love building complex, scalable, and highly available data products and manage how data brings life to many services and decision making process.
You enjoy an agile, fast paced and highly technical environment. You are comfortable with tackling a problem, driving a solution from inception to birth, leading cross-functional collaborations, and communicating technical and non-technical information across multiple functions and levels.
You are dedicated to engineering excellence yet pragmatic enough to balance quality principles, regulatory compliance and business needs.
In addition, you bring
- 5+ Years experience in designing, implementing and operating distributed data pipelines and integration architecture on premise and in the cloud.
- 5+ years in development, optimization and administration of relational databases (e.g. PostgreSQL, MySQL and Oracle) and NoSQL data stores (e.g. MongoDB, Cassandra)
- 3+ years Hands-on experience in technologies like Redshift, Spark, Hive, Presto, Kafka and Sqoop
- Experience in building & operating realtime and batch data pipeline.
- Experience in implementing multiple data aggregation strategy (nightly, intraday, across different day definition - multiple time zones)
- Exceptional programming skills in Python or Java
- Experience in moving data between relational database to cloud (S3, Redshift) is a plus.
- Experience with managing data in regulated healthcare environment (HIPAA compliant) is a plus
- Experience with reporting and visualization tools is a big plus
- Bachelor’s degree in softwareengineering, CS, or EE is ideal