H1

Staff Data Engineer- Data Lake

H1$170K — $190K *
Healthcare
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 8+ years of experience in data engineering or software engineering with a focus on distributed data platforms
  • Demonstrated technical leadership experience and interest in mentoring engineers
  • Strong proficiency in Python (PySpark), Java, Scala, or similar languages
  • Advanced SQL expertise with performance tuning skills in large datasets
  • Experience with Apache Spark and cloud-native big data platforms, particularly AWS
  • Familiarity with orchestration tools like Argo or Airflow
  • Knowledge of distributed storage systems and file formats such as Parquet and Avro

Responsibilities

  • Architect and scale ETL/ELT pipelines across healthcare data
  • Lead evolution of Data Lake architecture focusing on reliability and cost optimization
  • Enhance data quality, validation, and standardization workflows
  • Design batch and near real-time data processing frameworks
  • Optimize performance of distributed compute and storage systems
  • Implement monitoring improvements and operational excellence across the platform
  • Mentor engineers and promote engineering best practices

Benefits

  • Full suite of health insurance options
  • Generous paid time off
  • Pre-planned company-wide wellness holidays
  • Retirement options available
  • Health and charitable donation stipends
  • Impactful Business Resource Groups
  • Flexible work hours and remote work opportunities
  • Engagement with leading biotech and life sciences companies
Full Job Description
Data Engineering is responsible for the development and delivery of our most important asset-our data. With thousands of data sources from around the world, the team ensures that data is accurate, normalized, and delivered at a velocity that keeps up with real-world changes. As we expand our markets and the scope of data we provide to our customers, our team must scale to meet that demand.

WHAT YOU'LL DO AT H1

As a Staff Data Engineer on the Data Lake team at H1, you will play a critical role in shaping the architecture, scalability, reliability, and long-term direction of our core data platform. This role is designed for a highly technical engineer who is excited to grow into an Engineering Manager track while remaining deeply hands-on technically.

The Data Lake is the foundation of H1's platform, responsible for the validation, accuracy, standardization, and quality of the data powering every downstream product and team across the organization. You will help lead the evolution of this platform while supporting and mentoring a growing team of engineers.

You will:
- Architect, build, and scale distributed ETL/ELT pipelines and large-scale ingestion frameworks across structured and unstructured healthcare datasets.
- Lead the evolution of H1's Data Lake architecture with a focus on scalability, observability, reliability, and cost optimization.
- Own and improve data quality, validation, normalization, and standardization workflows across thousands of global data sources.
- Design and optimize batch and near real-time data processing frameworks using cloud-native distributed systems.
- Optimize distributed compute and storage systems, including Spark workloads, query performance, partitioning strategies, and infrastructure efficiency.
- Drive improvements in monitoring, governance, operational excellence, and production reliability across the platform.
- Troubleshoot complex production data and infrastructure issues across distributed systems.
- Partner closely with Product, Infrastructure, Security, Compliance, and downstream engineering teams to support scalable and secure data delivery.
- Mentor engineers through technical leadership, architecture reviews, and engineering best practices.
- Help define technical roadmap priorities and contribute to long-term platform strategy and execution planning.
- Support production operations, incident response, and platform health as part of overall ownership of the Data Lake ecosystem.

ABOUT YOU

You are a highly technical data engineer who thrives in lean, high-ownership environments and enjoys solving complex distributed systems challenges. You are excited by the opportunity to influence technical direction, mentor engineers, and grow into broader engineering leadership responsibilities while remaining hands-on.

- You have deep experience designing and scaling distributed data platforms and large-scale pipelines in cloud-native environments.
- You excel at building reliable, observable, and maintainable data systems supporting critical business and analytics workloads.
- You have strong expertise in distributed processing, performance optimization, and modern data architecture patterns.
- You are comfortable leading technical initiatives and influencing architecture decisions across teams.
- You communicate effectively with both technical and non-technical stakeholders.
- You enjoy mentoring engineers and helping raise the engineering bar across teams.
- You are energized by ownership, autonomy, and solving ambiguous technical challenges.

REQUIREMENTS

- 8+ years of experience in data engineering, software engineering, or related fields with significant experience building and scaling distributed data platforms.
- Demonstrated technical leadership experience with interest in or experience mentoring and leading engineers.
- Strong proficiency in Python (PySpark), Java, Scala, or similar programming languages.
Advanced SQL expertise, including performance tuning and optimization across large datasets.
- Deep experience with Apache Spark and cloud-native big data platforms, preferably within AWS environments (EMR, Glue, S3, Athena, Redshift, or similar).
- Experience designing and scaling modern cloud-native data lake architectures and large-scale ingestion frameworks.
- Experience with orchestration and workflow management tools such as Argo, Airflow, or similar technologies.
- Strong understanding of distributed storage systems, partitioning strategies, and file formats such as Parquet, Avro, and ORC.
- Experience with Docker, Kubernetes, and modern containerization technologies.
- Experience implementing monitoring, observability, and data quality frameworks within production environments.
- Experience with large-scale data cleaning, parsing, normalization, and validation workflows preferred.
- Experience working with healthcare, life sciences, publication, or large-scale entity-resolution datasets preferred.
- Exposure to ML/AI-driven data enrichment, parsing, or validation workflows is a plus.

- Experience using AI-assisted coding tools (e.g., GitHub Copilot, Claude Code) to accelerate development while maintaining quality is encouraged

COMPENSATION

This role pays $170,000 to $190,000 per year, based on experience, in addition to stock options.

Anticipated role close date: 8/1/2026

H1 OFFERS

- Full suite of health insurance options, in addition to generous paid time off

- Pre-planned company-wide wellness holidays

- Retirement options

- Health & charitable donation stipends

- Impactful Business Resource Groups

- Flexible work hours & the opportunity to work from anywhere

- The opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve healthcare around the globe

About H1

H1 Inc. is an American healthcare data technology company headquartered in New York City that provides services globally. The company's database is used by healthcare and pharmaceutical companies and related organizations to identify healthcare professionals to partner with on research in order to accelerate development of drugs and other treatments. The company has over 400 employees worldwide and about 100 clients including pharmaceutical companies Novartis and AstraZeneca as of November 2021.
Learn more about H1

Similar Jobs

More Jobs at H1

More Healthcare Jobs

Find similar Staff Data Engineer- Data Lake jobs: