Sr. Hadoop Developer
We are looking for ?Sr. Hadoop Developer for our client in Raleigh, NC
Job Title: Sr. Hadoop Developer
Job Location: Raleigh, NC
Job Type: Contract – 12 Months / Contract to Hire / Direct Hire
- Design, develop, and deliver solutions based on Big Data applications that fulfill the strategic vision for enterprise applications to successfully support the business.
Activities will include:
- Perform the full deployment lifecycle, from on-premises to the cloud, including installation, configuration, initial production deployment, recovery, security and data governance for Hadoop.
- Evaluates and provides technical solutions to design, develop, and support as required in a lead role to business units wishing to implement an information technology solution.
- Refine raw data into actionable insight using visualization and statistics with innovative analytics applications and systems.
- Develop applications that can interact with the data in the most appropriate way, from batch to interactive SQL or low latency access using latest tools - Hortonworks Data Platform (HDP) preferred.
- Essential Functions:
- Leads implementation (installation and Configuration) of HDP with complete cluster deployment layout with replication factors, setup NFS Gateway to access HDFS data, resource managers, node managers & various phases of Map Reduce Jobs. Experience with configuring workflows and deployment using tools such as Apache Oozie is necessary.
- Participates in design, development, validation, and maintenance of the Big Data platform and associated applications. Provides assistance in architecture oversight to how the platform is built to ensure that it supports high volume / high velocity data streams and is scalable to meet growth expectations.
- Monitor workflows and job execution using the Ambari UI, Ganglia or any equivalent tools. Assisting administration in commission and decommission of nodes, back up and recover Hadoop data using snapshots & high availability. Good understanding of rack awareness and topology is preferred.
- Develops, implements, and participates in designing column family schemas of Hive and Hbase within HDFS. Experience in designing Hadoop flat and Star models with Map Reduce impact analysis is necessary.
- Develops Data layer for performance critical reporting system. Experience with real time big datareporting system is necessary.
- Recommends and assists with the development and design of HDFS – hive data partitioning, Vectorization and bucketing with Horton works Big Insights query tools. Perform Day to Day operational tasks using flume and Sqoop insight data to different RDBMS. Expertise in java scripts, UNIX shell scripts to support custom functions or steps is required.
- Develops guidelines and plans for Performance tuning of a Hadoop/NoSQL environment with underlying impact analysis of Map-reduce jobs using CBO and analytical conversions and. Implement a mixed batch / near-real time architecture to analyze, index, and publish data for applications. Write a custom reducer that reduces the number of underlying Map Reduce jobs generated from a Hive query. Helps with cluster efficiency capacity planning and sizing.
- Develops efficient Hive scripts with joins on datasets using a variety of techniques, including Map-side and Sort-Merge joins with various analytical functions .Experience with advanced Hive features like windowing, CBO,views and ORC files and compression techniques are necessary. Perform development of jobs to capture CDC (Change Data Capture) from Hive based internal, external and managed systems.
- Partners with key internal teams, such as clinical operations and data management, to ensure that the Big Data solution is identifying all the data points in upstream systems and classifying them appropriately to support analytic objectives.
- Identifies and implements appropriate information delivery mechanisms that improve decision-making capability of our customers.
- Design , Develop and troubleshoot transformations to ingest and manipulate data from various sources within the company and their extended environment using native Hadoop tools or any ETL tools such as Pentaho Data Integrator.with Hadoop-hive based data transformations.
- Designing and setting up exception handling jobs, writing Oracle scripts, functions, stored procedures, complex SQL queries, PL/SQL Analytical functions, hierarchical, parent-child queries to support application systems.
- Providing Solutions for Portal and mash-up integration seamlessly connecting business analytics with other applications in a publisher/subscriber model is a plus.
- BA/BS in computer science or similar discipline, plus 3+ years deep development experience in technologies such as Hadoop (HDP preferred), Hortonworks Data Flow (HDF), and Oracle databases
- A very strong SQL/data analysis or data mining background , experience with Business Intelligence, Data Warehousing, Solid understanding of large scale data management environments (relational and/or NoSQL), Audit Controls, ETL Framework is expected.
- Experience with Hortonworks Data Flow (NiFi project) greatly preferred.
- Prior experience in building scalable distributed data processing solutions with Hadoop using tools such as HBase (NoSQL), Hive, Flume, and Scoop.
- Some proficiency with MapReduce / HDFS architecture, and Linux or Unix OS system management and at least one scriptinglanguage experience is required.
- Hortonworks certified developers strongly preferred, but Cloudera is acceptable.