Conducts data analytics, data engineering, data mining, exploratory analysis, predictive
analysis, and statistical analysis, and uses scientific techniques to correlate data into graphical,
written, visual and verbal narrative products, enabling more informed analytic decisions.
Proactively retrieves information from various sources, analyzes it for better understanding about
the data set, and builds AI tools that automate certain processes. Duties typically include:
creating various ML-based tools or processes, such as recommendation engines or automated
lead scoring systems. Performs statistical analysis, applies data mining techniques, and builds
high quality prediction systems. Should be skilled in data visualization and use of graphical
applications, including Microsoft Office (Power BI) and Tableau; major data science languages,
such as R and Python; managing and merging of disparate data sources, preferably through R,
Python, or SQL; statistical analysis; and data mining algorithms. Should have prior experience
with large data Multi-INT analytics, ML, and automated predictive analytics.
Contractor shall:
• Create data packages, in the form of databases, reports, and visualization'
• Communicate ongoing data science activities, technical findings, and data products for both
technical and non-technical customers
• Extract relevant features from large data stores containing open source, PIA, and CAI,
containing bad records, partial records, errors, or other forms of "noising."
• Extract features from open source information stored in a wide range of possible formats,
including JSON, XML, raw text logs, industry-specific encodings, and graph link data;
• Apply natural language processing, computer vision, signal processing, and speaker and speech
recognition algorithms to identify objects in text, image, video, and audio files;
• Apply descriptive and inferential statistics to describe data and make
predictions about the data, including statistical tests to determine confidence for a hypothesis,
common summary statistics (e.g. mean, variance, and counts), fit distributions to datasets and
use those distributions to predict event likelihoods;
• Be able to execute data science method using parallel computing
frameworks (e.g. deepleaming4j, Torch, Tensor Flow, Caffe, Neon, NVIOFFICE CUDA Deep
Neural Network library (cuDNN), and OpenCV)) and distributed data processing frameworks
( e.g. Hadoop (including HDFS, Hbase, Hive, Impala, Giraph, Sqoop ), Spark (inlcuding MLib,
GraphX, SQL and Dataframes)
• Be able to execute data science method using common programming/scripting
languages: Python, Java, Scala, R (statistics).