Data Engineer Location: Remote (California)
We are seeking a
Junior Data Engineer to support a confidential project within the Health sector. This role will focus on modernizing data pipelines, optimizing metric computation, and enabling scalable analytics through cloud-based solutions.
THE OPPORTUNITY FOR YOU Airflow pipeline overhaul - Build out of Athena Operator support for Airflow jobs
- Custom SQL hidden in QuickSight moved under source control
- Compute moved to AWS Athena where possible
- Correct operators used for Python jobs
- Rebuild of Airflow testing process to remove need for separate Airflow testing for query-only updates
Metric job optimization - AI-driven scan of code to look for improvement opportunities
- Investigation of whether and when to use Spark
Automated metrics extraction - AI-based sweep of code to build text-based metrics descriptors
- Generation of v1 metrics catalog pulling from metric descriptors
KEY SUCCESS FACTORS (Top 3 Must-Have Skills) - 3+ years experience as a Data Engineer supporting complex enviroments
- Hands on experience with AWS services, specifically Athena and managed Airlfow for using Python, plus familiarity with Github, and Apache Spark
- Strong programming skills in SQL, Python and Pandas for writing code
NICE TO HAVES - Experience with CI/CD tools such as Concourse CI or similar pipelines
- Familiarity with AWS Glue
- Experience with medical care delivery data