Full Job Description
Your role and responsibilities:
As a Lead Data Engineer, you will take Xanadu's existing data pipelines and transform them into a cohesive data platform. Working alongside hardware researchers, data scientists, and finance analysts, you'll define how data flows across the organization and build the infrastructure that enables data-driven decisions at every level. This is a hands-on technical role first - you will be the first dedicated data engineer, with the opportunity to build a data team around you.
• Lead technical design and delivery of data pipelines spanning R&D, manufacturing, and business analytics
• Build and scale robust cloud data infrastructure (ingestion, transformation, serving)
• Define data models in collaboration with researchers and analysts to ensure scientific and business data is stored correctly and queryable
• Establish data governance foundations: lineage, cataloging, access control, and quality monitoring
• Drive best practices across code quality, testing, data reliability, and observability
• Balance new development, platform improvements, and technical debt reduction
• Leverage AI-assisted tooling to accelerate engineering productivity
• Mentor and provide technical guidance to engineers across the organization as the practice grows
Basic qualifications and experience:
• 7+ years in data engineering, with 2+ years in a lead or architect capacity
• Deep experience building and scaling data platforms in the cloud (e.g., Databricks, Snowflake)
• Production experience designing and operating data lake or lakehouse architectures (Delta Lake, Iceberg, or Hudi)
• Expertise in distributed systems and data integrity at scale
• Hands-on experience with modern data stack tooling (dbt, Fivetran, or similar) and orchestration (Airflow, Dagster, Prefect)
• Strong SQL and Python skills
• Knowledge of infrastructure-as-code and CI/CD for data pipelines
• Proven ability to drive technical standards and engineering improvements across teams
• Experience working with cross-functional teams - especially R&D or science teams producing unstructured or semi-structured data
Preferred qualifications and experience:
• Experience in a deep-tech, hardware, or semiconductor environment where data originates from physical measurement systems
• Familiarity with time-series or scientific data formats (HDF5, Parquet for measurement traces, etc.)
• Prior experience as the "first data engineer" - building a practice from scratch, not inheriting one
• Familiarity with LIMS systems or laboratory data workflows
This is for a new position. Your base salary will be determined based on your location, experience, and internal benchmarks. You will also be eligible for equity and benefits.