Full Job Description
Major Duties : Develop software, typically in Python, to independently acquire data from disparate sources (databases, files, APIs, etc.) and combine them into appropriate training , validation and testing datasets
Analyze raw datasets using descriptive statistics, working directly with domain experts to understand the meaning of data fields
Build unit tests, data quality checks and data pipelines to ensure that algorithms use trusted data
Develop and maintain an understanding of many algorithms across supervised learning, unsupervised learning and time series analysis
Propose and develop machine learning ensemble methods that exhibit the best out-of-sample characteristics possible given the input dataset
Utilize expertise in machine learning algorithms to tune algorithms using available hyper-parameters and carefully select feature subsets
Discover biases or leakage in datasets and ensure that train/test splits reflect realistic expectations of real world performance
Run large scale (either in parallel and/or distributed) training and inference jobs on private or public cloud infrastructure
May present findings to internal and external customers using both data science language (F1 scores, regression error, statistical significance, etc.) as well as business domain specific language gained from experience analyzing the data in scope.
Provide some guidance to other software development teams as Data Science Lab prototypes are engineered for full production environments
Work across multiple projects in a fluid environment where work is required across the full research lifecycle from forming a hypothesis, acquiring data, and developing ETL-style software to presenting findings.
Plan and execute data science training sessions and hackathons
Work with external parties (vendors, universities, etc.) to incorporate new techniques and tools into the data science lab
Solves complex problems
Takes a new perspective on existing solutions
Exercises judgment based on the analysis of multiple sources of information
Impacts a range of customer, operational, project or service activities within own team and other related teams
Works within broad guidelines and policies
Knowledge : Python, Common Python libraries (numpy ,pandas, sklearn, etc.), Linux based operating systems, and basic development tools (Python IDEs, source control, etc.) required
Advanced distributed machine learning frameworks (e.g. Keras, TF, etc.), Azure cloud infrastructure preferred
Requires in-depth conceptual and practical knowledge in own job discipline and basic knowledge of related job disciplines
Applies best practices and how own area integrates with others
Explains difficult or sensitive information; works to build consensus
Experience : Computer Science degree (undergraduate or graduate level) and strong statistical background. Data Science specific graduate work, Finance sector experience or coursework preferred
Acts as a resource for colleagues with less experience;
May lead small projects with manageable risks and resource requirements
Salary Range:
$114,500 - 194,700 USD
Salary range is a good faith estimate of base pay. Northern Trust provides a comprehensive benefits package including retirement benefits (401k and pension), health and welfare benefits (medical, dental, vision, spending accounts and disability), paid time off, parental and caregiver leave, life & accident insurance, and other voluntary and well-being benefits. Northern Trust also provides a discretionary bonus program that may include an equity component.