Full Job Description
We’re seeking a Data Engineer with strong enterprise experience delivering data product solutions on the Databricks Lakehouse Platform. The ideal candidate brings hands-on expertise building scalable, governed, production-grade data pipelines and analytics products using Databricks, Apache Spark, SQL, and CI/CD practices. This role focuses on transforming raw data into trusted, reusable data products that support analytics, reporting, and downstream applications across the enterprise.
How you’ll make an impact:
• Apply hands-on expertise in Python, cloud technologies, Databricks, Apache Spark, SQL, and AI to build scalable and innovative data solutions
• Design, build, and maintain scalable, well-documented enterprise data products and ETL/ELT pipelines supporting batch and near-real-time processing
• Monitor, optimize, and tune data infrastructure and Spark/Databricks workloads to improve performance and cost efficiency
• Collaborate cross-functionally with Data Science, Analytics, DevOps, Product, and Platform Engineering teams to deliver new features and data products
• Troubleshoot and resolve data pipeline issues; ensure data quality through testing, monitoring, and CI/CD best practices
• Ensure compliance with data privacy standards, including the handling of Patient Health Information (PHI), and adhere to enterprise security and governance policies
• Contribute to architecture discussions and implement best practices such as layered data modeling and Lakehouse design patterns
• Perform other incidental duties as assigned
What you’ll need:
• Bachelor’s degree in a related field, plus experience in Data Engineering, DevOps, or Software Development
• Strong hands-on Databricks experience
• Advanced SQL skills for data transformation, analytics, and data modeling
What else we look for:
• Hands-on experience with cloud platforms (AWS preferred: EC2, S3, Lambda; Azure or GCP also acceptable)
• Experience with Databricks Workflows, Repos, Jobs, and cluster configuration
• Proven experience building and deploying production-grade data workloads
• Proficiency in Spark (PySpark or Scala) with strong understanding of performance tuning and internals
• Experience with Delta Lake (e.g., schema evolution, time travel, optimization)
• Understanding of Lakehouse architecture, medallion design, and enterprise data modeling standards
• Strong foundation in databases, ETL processes, and data structures
• Experience with data governance and security practices (RBAC, data masking, PII handling)
• Familiarity with data quality frameworks (e.g., Great Expectations, Deequ, or custom validation checks)
• Understanding of data contracts, SLAs, lineage, and data ownership models
• Experience with CI/CD pipelines (Azure DevOps, GitHub Actions, or GitLab CI) and version control using Git
• Experience delivering end-to-end data products (not just pipelines) for analytics, BI, and application use cases
• Experience working in regulated environments (e.g., healthcare, finance, or life sciences) preferred
• Eligibility to work in the U.S. or EU, with the ability to travel as required
• Ability to comply with company policies, including Environmental Health & Safety and applicable workplace protocols
Aligning our overall business objectives with performance, we offer competitive salaries, performance-based incentives, and a wide variety of benefits programs to address the diverse individual needs of our employees and their families.
For California (CA), the base pay range for this position is $87,000 to $123,000 (highly experienced).
The pay for the successful candidate will depend on various factors (e.g., qualifications, education, prior experience).Applications will be accepted while this position is posted on our Careers website.