Hadoop Data Engineer


New York, NY

Industry: Education


5 - 7 years

Posted 164 days ago

  by    Gopinath Rao

This job is no longer available.

Our client seeks a Data Engineer to maintain and enhance our Enterprise Data Warehouse, Data Lake and Analytics systems. This role is a hands-on engineering position responsible for the continued evolution of our data fabric and platforms for business intelligence.

Essential Functions

Create and manage data flow into multiple Enterprise Data Warehouse and Big Data systems.

Assist with architecture and design for Business Intelligence and Analytic workflows. 

Architecture, design, development, and implementation of Data Warehousing and Data Lake management.

Handle testing, deployment and ongoing support of data warehouse systems and data pipelines.

Work with our reporting and analytic teams to build marts and tables to optimize reporting performance.

Develop and support data flow, ETL and machine learning processes.

Manage the development and support of Hadoop ecosystem workflows.

Architecting, building and maintaining data services and products across Big Data, NoSQL and RMDBS spaces.


Candidate should be fully proficient in implementing data products and services, including extensive project experience in designing implementing enterprise data platforms, both on-premise and in cloud, using best practices and focusing on accuracy, reliability, performance, scalability and security. Must troubleshoot and resolve complex problems.

Demonstrated expertise is using SQL in warehousing contexts and other complex environments.

Data modeling for data intensive application from core star schema and dimensional modelling to contemporary NoSQL and denormalized data lake architectures.

Demonstrated experience in developing for cloud data lakes (e.g. Azure Data Lake, Google Big Query, 1010data)

Development, monitoring and management of Hadoop workflows. (Hortonworks and HDInsight are a big plus)

Experience in the development and management of data pipelines using tools such as SSIS and Informatica as well as new models such as NiFi, Airflow and Luigi.

Experience in populating and leveraging OLAP tools and SQL on Hadoop platforms (Presto, Hive, Hawq, SparkSQL), including tools such as Druid

Ability to develop code in Python (Java, Scala and R are all beneficial), including experiences with Numpy/Pandas/Sci-kit.

Ability to leverage and partner business intelligence tools such as Power BI, Tableau, MicroStrategy

Integration of streaming architectures in Big Data implementations and experience with Kafka, RabbitMQ and/or related cloud offerings.

Machine learning experience in both Hadoop (Mahout, SparkML) and other platforms (Sci-kit Learn, CNTK, TensorFlow) as well as contemporary cloud offerings (Azure ML, Cortana Intelligence, Cognitive Services, Google Cloud ML Engine).

BA/BS, or equivalent experience, in Computer Science or related technical discipline

6+ years of experience doing data warehouse and Hadoop ecosystem development

2 or more years of experience in Hadoop engineering (e.g. Pig, Hive, Falcon, Sqoop, Kafka, Spark) 

2 or more years of ETL and relational warehouse, real world experience with SSIS (or similar)

$110K - $130K