Develop technical architectures, designs, and processes to extract, cleanse, integrate, organize and present data from a variety of sources and formats for analysis and use across use cases.
Perform data profiling, discovery, and analysis to identify/determine location, suitability and coverage of data, and identify the various data types, formats, and data quality which exist within a given data source.
Work with source system and business SME’s to develop an understanding of the data requirements and options available within customer sources to meet the data and business requirements.
Create logical extraction/ingestion templates and maps to demonstrate the logical flow and manipulation of data required to move data from customer source systems into the target data lake, warehouse, and/or sandbox.
Perform hands on data development to accomplish the data extraction, movement and integration, leveraging state of the art tools and practices, including both streaming and batched data ingestion techniques.
Provide elbow-to-elbow style mentoring of customer resources and other consultants.
Assist in creation of data requirements and data model design as necessary and appropriate.
Minimum of 5 years of experience working with the Apache Hadoop Ecosystem of tools and technologies to extract, integrate, cleanse and organize data, including experience with either the Hortonworks or Cloudera distributions.
Key Tools andTechnologies
Informatica BDM (nice to have)
Informatica Blaze (nice to have)
Experienceworking with the following types of workloads and pipelines:
Enterprise-scale ETL and ELT batched workloads
Near real-time micro-batches
Experience working with Data Governance frameworks
Some experience performing conceptual and logical data model design
Experience in the Financial Services, Retail industry, or Healthcare Payor or Provider industries is a plus.
Strong NoSQL, SparkSQL, and ANSI SQL query language skills
Strong verbal and written communication and English language skills