Job DescriptionWe are seeking an experienced Cloud Data Warehouse and AI/ML Engineer to design, build, and maintain large-scale data warehouses, ETL pipelines, and AI/ML models that drive data-driven decision-making. The successful candidate will have a strong background in data warehousing, ETL, and AI/ML, with experience in working with various tools.
Key Responsibilities:
- Design and implement data pipelines to ingest, extract, transform, load (ETL) data and store large data sets from various sources
- Build and maintain data warehouses, including data modeling, data governance, and data quality
- Ensure data quality, integrity, and security by implementing data validation, data cleansing, and data governance policies
- Optimize data systems for performance, scalability, and reliability
- Collaborate with customers to understand their technical requirements and provide guidance on best practices for using Amazon Redshift
- Work with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver data solutions
- Provide technical support for Amazon Redshift, including troubleshooting, performance optimization, and data modeling
- Integrate AI/ML models with ETL pipelines and data warehouses to enable predictive analytics and machine learning
- Develop and deploy AI/ML models using popular frameworks such as TensorFlow, PyTorch, or Scikit-learn
- Work with data scientists to develop and deploy data-driven products and features
- Identify and resolve data-related issues, including data pipeline failures, data quality issues, and performance bottlenecks
- Develop technical documentation and knowledge base articles to help customers and AWS engineers troubleshoot common issues
Required Skills and Education- Bachelor's or Master's degree in Computer Science or a related field, with at least 6 years of experience in Information Technology, or equivalent experience.
- Proficiency in one or more programming languages (e.g., Python, Java, Scala)
- 6+ years of experience in data engineering, with a focus on designing and implementing large-scale data systems
- 5 + years of hands-on experience in writing complex, highly optimized queries across large data sets using AWS Redshift, Oracle and SQL Server.
- 5 + years of hands-on experience using AWS Glue, python/spark to build ETL pipelines in a production setting, including writing test cases
- Strong understanding of database design principles, data modeling, and data governance
- Proficiency in SQL, including query optimization, indexing, and performance tuning
- Experience with data warehousing concepts, including star and snowflake schemas
- Strong analytical and problem-solving skills, with the ability to break down complex problems into manageable components
- Experience with data storage solutions such as relational databases (Oracle, SQL Server), NoSQL databases and cloud-based data warehouses (Redshift)
- Experience with data processing frameworks such as Apache Kafka, Fivetran
- Experience in building ETL pipelines using AWS Glue, Apache Airflow, and programming languages including Python and PySpark
- Understanding of data quality and governance principles and best practices
- Experience with AWS services and best practices
Soft Skills:
- Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams
- Strong problem-solving and analytical skills, with the ability to break down complex problems into manageable components
- Ability to work in a fast-paced environment and adapt to changing priorities
- Strong attention to detail and ability to maintain high-quality documentation
- Commitment to continuous learning and professional development
Preferred Skills and Education- Experience with Dataiku
- Experience with building reports using PowerBI and Tableau
- Familiarity with agile development methodologies and version control systems such as Git
- Certification in data warehousing, ETL, or AI/ML (e.g., Certified Data Warehouse Architect, Certified ETL Developer)
- Experience with agile development methodologies such as Scrum or Kanban
This position will be posted for at least 5 calendar days. The posting will remain active until the position is filled, or a qualified pool of candidates is identified.