Staff Data Engineer- Data Lake

H1 • $170K — $190K *

New York, NY 10025Hybrid

Healthcare

8 - 10 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

8+ years of experience in data engineering or software engineering with a focus on distributed data platforms
Demonstrated technical leadership experience and interest in mentoring engineers
Strong proficiency in Python (PySpark), Java, Scala, or similar languages
Advanced SQL expertise with performance tuning skills in large datasets
Experience with Apache Spark and cloud-native big data platforms, particularly AWS
Familiarity with orchestration tools like Argo or Airflow
Knowledge of distributed storage systems and file formats such as Parquet and Avro

Responsibilities

Architect and scale ETL/ELT pipelines across healthcare data
Lead evolution of Data Lake architecture focusing on reliability and cost optimization
Enhance data quality, validation, and standardization workflows
Design batch and near real-time data processing frameworks
Optimize performance of distributed compute and storage systems
Implement monitoring improvements and operational excellence across the platform
Mentor engineers and promote engineering best practices

Benefits

Full suite of health insurance options
Generous paid time off
Pre-planned company-wide wellness holidays
Retirement options available
Health and charitable donation stipends
Impactful Business Resource Groups
Flexible work hours and remote work opportunities
Engagement with leading biotech and life sciences companies

Full Job Description

Data Engineering is responsible for the development and delivery of our most important asset-our data. With thousands of data sources from around the world, the team ensures that data is accurate, normalized, and delivered at a velocity that keeps up with real-world changes. As we expand our markets and the scope of data we provide to our customers, our team must scale to meet that demand.

WHAT YOU'LL DO AT H1

As a Staff Data Engineer on the Data Lake team at H1, you will play a critical role in shaping the architecture, scalability, reliability, and long-term direction of our core data platform. This role is designed for a highly technical engineer who is excited to grow into an Engineering Manager track while remaining deeply hands-on technically.

The Data Lake is the foundation of H1's platform, responsible for the validation, accuracy, standardization, and quality of the data powering every downstream product and team across the organization. You will help lead the evolution of this platform while supporting and mentoring a growing team of engineers.

You will:
- Architect, build, and scale distributed ETL/ELT pipelines and large-scale ingestion frameworks across structured and unstructured healthcare datasets.
- Lead the evolution of H1's Data Lake architecture with a focus on scalability, observability, reliability, and cost optimization.
- Own and improve data quality, validation, normalization, and standardization workflows across thousands of global data sources.
- Design and optimize batch and near real-time data processing frameworks using cloud-native distributed systems.
- Optimize distributed compute and storage systems, including Spark workloads, query performance, partitioning strategies, and infrastructure efficiency.
- Drive improvements in monitoring, governance, operational excellence, and production reliability across the platform.
- Troubleshoot complex production data and infrastructure issues across distributed systems.
- Partner closely with Product, Infrastructure, Security, Compliance, and downstream engineering teams to support scalable and secure data delivery.
- Mentor engineers through technical leadership, architecture reviews, and engineering best practices.
- Help define technical roadmap priorities and contribute to long-term platform strategy and execution planning.
- Support production operations, incident response, and platform health as part of overall ownership of the Data Lake ecosystem.

ABOUT YOU

You are a highly technical data engineer who thrives in lean, high-ownership environments and enjoys solving complex distributed systems challenges. You are excited by the opportunity to influence technical direction, mentor engineers, and grow into broader engineering leadership responsibilities while remaining hands-on.

- You have deep experience designing and scaling distributed data platforms and large-scale pipelines in cloud-native environments.
- You excel at building reliable, observable, and maintainable data systems supporting critical business and analytics workloads.
- You have strong expertise in distributed processing, performance optimization, and modern data architecture patterns.
- You are comfortable leading technical initiatives and influencing architecture decisions across teams.
- You communicate effectively with both technical and non-technical stakeholders.
- You enjoy mentoring engineers and helping raise the engineering bar across teams.
- You are energized by ownership, autonomy, and solving ambiguous technical challenges.

REQUIREMENTS

- 8+ years of experience in data engineering, software engineering, or related fields with significant experience building and scaling distributed data platforms.
- Demonstrated technical leadership experience with interest in or experience mentoring and leading engineers.
- Strong proficiency in Python (PySpark), Java, Scala, or similar programming languages.
Advanced SQL expertise, including performance tuning and optimization across large datasets.
- Deep experience with Apache Spark and cloud-native big data platforms, preferably within AWS environments (EMR, Glue, S3, Athena, Redshift, or similar).
- Experience designing and scaling modern cloud-native data lake architectures and large-scale ingestion frameworks.
- Experience with orchestration and workflow management tools such as Argo, Airflow, or similar technologies.
- Strong understanding of distributed storage systems, partitioning strategies, and file formats such as Parquet, Avro, and ORC.
- Experience with Docker, Kubernetes, and modern containerization technologies.
- Experience implementing monitoring, observability, and data quality frameworks within production environments.
- Experience with large-scale data cleaning, parsing, normalization, and validation workflows preferred.
- Experience working with healthcare, life sciences, publication, or large-scale entity-resolution datasets preferred.
- Exposure to ML/AI-driven data enrichment, parsing, or validation workflows is a plus.

- Experience using AI-assisted coding tools (e.g., GitHub Copilot, Claude Code) to accelerate development while maintaining quality is encouraged

COMPENSATION

This role pays $170,000 to $190,000 per year, based on experience, in addition to stock options.

Anticipated role close date: 8/1/2026

H1 OFFERS

- Full suite of health insurance options, in addition to generous paid time off

- Pre-planned company-wide wellness holidays

- Retirement options

- Health & charitable donation stipends

- Impactful Business Resource Groups

- Flexible work hours & the opportunity to work from anywhere

- The opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve healthcare around the globe

About H1

H1 Inc. is an American healthcare data technology company headquartered in New York City that provides services globally. The company's database is used by healthcare and pharmaceutical companies and related organizations to identify healthcare professionals to partner with on research in order to accelerate development of drugs and other treatments. The company has over 400 employees worldwide and about 100 clients including pharmaceutical companies Novartis and AstraZeneca as of November 2021.

Learn more about H1

Industry

Information Technology

* Ladders Estimates

Similar Jobs

Senior Platform Engineer
$150K — $220K *
Charlie Health Outreach
Remote
Today
Resident Solutions Architect - Digital Native Business
$180K — $248K *
Databricks
Remote
Today
Resident Solutions Architect - Digital Native Business
$180K — $248K *
Databricks
New York, NY 10025 (New York County)
Today
Resident Solutions Architect - Digital Native Business
$180K — $248K *
Databricks
Remote
Today
Member of Technical Staff (Software Engineer, Data Platform)
$130K — $180K *
Perplexity
New York, NY 10025 (New York County)
Today
Director, Data Engineering
$135K — $216K *
Starcom Mediavest Group Germany Gmbh
New York, NY 10025 (New York County)
Yesterday

Get Ready For Your
Next Interview

More Jobs at H1

Staff Data Engineer - Emerald
$170K — $190K *
New York, NY 10025 (New York County)
Today
Healthcare
Hybrid
Staff Data Engineer- Data Lake
$170K — $190K *
New York, NY 10025 (New York County)
Today
Healthcare
Hybrid
Full-Stack Software Engineer
$90K — $130K *
Remote
1 week ago
Information Technology
Remote in United States
People Business Partner
$110K — $125K *
New York, NY 10025 (New York County)
1 week ago
Business Services
Hybrid
Staff Data Engineer
$170K — $190K *
New York, NY 10025 (New York County)
2 weeks ago
Healthcare
Hybrid

More Healthcare Jobs

Chief Medical Officer Part Time
$210K + $210,000 annually. mpi offers free medical, dental, vision, pto, *
Motion Picture Industry Pension & health Plans
Studio City, CA 91604 (Los Angeles County)
3 days ago
Clinical Specialist - Radiology
$125K + $15K bonus + equity *
Confidential Company
Atlanta, GA 30303 (Fulton County)
4 days ago
CEO Psych / Behavioral Health Hospital
$275K — $375K *
Confidential Company
Falls Church, VA 22044 (Fairfax County)
2 weeks ago
Dental Hygienist
$104K — $108K *
Aspire Dental
Miller Place, NY 11764 (Suffolk County)
Today
General Dentist
$221K — $234K *
Aspire Dental
Riverbank, CA 95367 (Stanislaus County)
Today

Find similar Staff Data Engineer- Data Lake jobs:

Nationwide New York, NY

Staff Data Engineer- Data Lake

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff Data Engineer- Data Lake jobs:

Get Ready For Your
Next Interview