Supports and performs the development and programming of machine learning integrated software algorithms to structure, analyze, and leverage data in a production environment.
Core Responsibilities- Leverages data pipeline designs and supports the development of data pipelines to support model development. Proficient with software tools that develop data pipelines in a distributed computing environment (PySprak, GlueETL).
- Supports integration of model pipelines in a production environment. Develops understanding of SDLC for model production.
- Reviews pipeline designs, makes data model design changes as needed. Documents and reviews design changes with data science teams.
- Supports data discovery & automated ingestion for model development. Performs detailed analysis of raw data sources for data quality, applies business context, and model development needs.
- Engages with internal stakeholders to understand and probe business processes in order to develop hypotheses. Brings structure to requests and translates requirements into an analytic approach. Participates in and influences ongoing business planning and departmental prioritization activities.
- Runs model monitoring scripts, follows process for alerts to management as needed. Addresses issues found in data pipelines from model monitoring alerts.
- Participates in special projects and performs other duties as assigned.
Qualifications- Undergraduate degree or equivalent experience; a graduate degree is preferred.
- Minimum of 5 years of relevant work experience.
- At least 3 years of hands-on experience designing ETL pipelines using AWS services (e.g., Glue, SageMaker).
- Proficiency in programming languages, particularly Python (including PySpark, PySQL) and familiarity with machine learning libraries and frameworks.
- Strong understanding of cloud technologies, including AWS and Azure, and experience with NoSQL databases.
- Familiarity with Feature Store usage, LLMs, GenAI, RAG, Prompt Engineering, and Model Evaluation.
- Experience with API design and development is a plus.
- Solid understanding of software engineering principles, including design patterns, testing, security, and version control.
- Knowledge of Machine Learning Development Lifecycle (MDLC) best practices and protocols.
- Understanding of solution architecture for building end-to-end machine learning data pipelines.
Special FactorsSponsorshipVanguard is not offering visa sponsorship for this position.