The CitiData - Big Data & Analytics organization is actively recruiting for a Data Science Engineer, with a solid understanding and extensive hands-on experience engineering Big Data analytical products for large scale Hadoop deployments. The candidate will sit at the office on Roosevelt Island and must have experience with analytical toolsets that integrate with the Hadoop ecosystem and will contribute to the architecture and engineering responsibilities of the Big Data offering within Citi’s portfolio. The candidate must have experience using these analytical toolsets in the development of data products and applications from proof of concept to production delivery.
• Partner with Citi sectors to solution analytical needs to large scale data problems using modern open source techniques and languages.
• Provide jumpstart services for business programs on the Hadoop based Enterprise Analytics Platform to accelerate their time to market while they invest in retooling their workforce.
• Collaborate with cross-functional engineering teams to identify vendor products and build portfolio of cutting edge analytical products and solutions.
• Publish and enforce Big Data analytics best practices, configuration recommendations, usage design/patterns, and cookbooks to developer community.
• 3+ years of experience with Big Data solutions and analytical techniques.
• 3+ years of experience in a technical consulting role.
• 5+ years overall IT experience.
• 5+ years of experience as Data Analysis/Scientist in the Big Data space.
• 1+ years of experience using GPU’s for Deep Learning or general HPC processing.
• Knowledge of the Hadoop ecosystem with experience in Hive, MapReduce, Impala, etc.
• Knowledge of the Spark ecosystem: Spark engine, mllib, GraphX, Spark SQL, etc.
• Experience with reporting and analytic tools such as Datameer, Arcadia Data, Tableau, R, Anaconda, Python, TensorFlow, Keras, MxNet, MicroStrategy, SAS, Alpine Data Labs, H2O, RStudio, Jupyter, etc.
• Experience creating machine learning, data mining, data visualization, data munging, and statistical models for Big Data problems.
• Experience with terabyte or larger size poly-structured data sets.
• Experience with statistical analysis software, such as Weka, R, Rapid-Miner, Matlab, SAS, SPSS.
• Expertise with a mix of depth and breadth across analytical techniques such as time series analysis, regression analysis and machine learning, deep learning, graph analytics, NLP.
• Experience in web analytics or behavioral targeting.
• Experience in Hadoop (Cloudera), ETL (Talend or AbInitio), MapReduce, Hive, Pig, HBase, Spark, etc.
• Experience with R and Python and at least one of Scala, Java, C++, or Clojure.
• Experience with full Hadoop SDLC deployments with associated administration and maintenance functions.
• Experience developing Hadoop integrations for data ingestion, data mapping and data processing.
• Good interpersonal skills with excellent communication skills - written and spoken English.
• Able to interact with client projects in cross-functional teams.
• Good team player interested in sharing knowledge and cross-training other team members and shows interest in learning new technologies and products.
• Ability to create documents of high quality. Ability to work in a structured environment and follow procedures, processes and policies.
• Self-starter who works with minimal supervision. Ability to work in a team of diverse skills and geographies