Role OverviewWe are seeking a Data Engineer to support Machine Learning and AI initiatives. Working closely with the Solution Architect, Data Architect, DevOps, and Application Engineering teams, this role is responsible for ensuring that data within our cloud-based platform is high quality, well-governed, feature-ready, and production-grade to support model training, deployment, and ongoing operations.
The ideal candidate has 5+ years of cloud data engineering experience with strong proficiency in Snowflake, Python, and SQL, and solid familiarity with AWS-native data services.
Candidates are not expected to arrive with expertise across every area listed. We are looking for demonstrated strength in the core data engineering and Snowflake skills, combined with the initiative and aptitude to grow into the broader scope of the role.
Day-One Priorities & ScopeImmediate focus is Snowflake-based data engineering, pipeline development, and data quality. Feature engineering, model training support, and MLOps contributions are growth areas that will ramp over time as you become embedded with the team.
Key ResponsibilitiesData Pipeline Engineering- Design, build, and maintain scalable data pipelines supporting ML/AI workloads.
- Engineer pipeline patterns including full loads, incremental loads, change-based loads, and slowly changing dimensions.
- Ensure pipelines are reliable, performant, secure, and maintainable, troubleshoot and monitor pipelines within an AWS ecosystem.
Snowflake & Cloud Data Engineering- Perform data transformations in Snowflake using SQL and native Snowflake features.
- Design and optimize schemas, tables, views, and materialized views for ML/AI consumption.
- Support AWS-native data lake patterns using S3, Glue, Athena, Apache Iceberg, and S3 Tables.
Feature Engineering & Data Preparation- Perform data cleansing, normalization, and enrichment to support ML model development.
- Design and implement feature engineering pipelines including aggregation and transformation.
- Ensure consistency, reuse, and versioning of features across models and use cases.
- Support feature store patterns to enable feature discoverability and reuse.
- Collaborate with ML engineers and data scientists to operationalize features into training pipelines.
Model Training & MLOps Support- Support model training workflows, including dataset preparation and scheduled refreshes.
- Ensure training datasets and features are reproducible, traceable, and auditable.
- Integrate data pipelines into CI/CD workflows; support version control, testing, and deployment of data assets.
- Monitor pipeline health, data freshness, and downstream impact on ML/AI systems.
Required Skills & Experience5+ years of hands-on data engineering experience in a cloud environment.
Core Technologies- Python - strong proficiency for data processing and pipeline development.
- SQL - advanced skills with hands-on Snowflake transformation experience.
- Snowflake - ELT pipeline design, schema optimization, performance tuning, cost management.
- PostgreSQL - experience with querying, data modeling, and analytics; familiarity with SQL Server to PostgreSQL migration a plus.
- AWS - S3, Glue, Athena, Snowflake integration, and managed relational databases (e.g., Aurora, RDS).
- Apache Iceberg / S3 Tables - familiarity with open table format ecosystems.
- Streaming ingestion tools (e.g., Kinesis, Kafka, or equivalent).
- Workflow orchestration tools (e.g., Airflow, Step Functions, or equivalent).
Pipeline & Data Engineering- Experience with full loads, incremental loads, append-only pipelines, change-based processing, and SCDs.
- Data validation, reconciliation, error handling, and restart/recovery patterns.
- Data modeling for analytics, ML/AI, and downstream application use cases.
- Ability to evaluate pipeline design trade-offs across performance, cost, reliability, and maintainability.
DevOps & Engineering Practices- Structured SDLC experience with CI/CD pipelines for data and ML workflows.
- API-based and event-driven data integration patterns.
- Distributed data processing environments.
ML/AI Data Foundations- Understanding of data requirements for ML/AI workloads.
- Experience preparing training datasets and features from enterprise data lakes.
- Familiarity with reproducibility, dataset versioning, and data lineage concepts.
- Familiarity with GenAI concepts relevant to data engineering, such as embedding pipelines, vector databases, retrieval-augmented generation (RAG) data flows, or prompt-driven data processing - including awareness of data security and privacy considerations when working with LLMs.
EducationBachelor's degree in Computer Science, Data Engineering, Information Systems, or a related technical field. Equivalent professional experience will be considered.
Location: Remote
Status: Full time position with full company benefits.