The Manager Data Engineering – Omnichannel role is responsible for understanding Janssen's commercial business objectives, developing and deploying Omnichannel data engineering solutions to support our large scale and high-performance data science and machine learning initiatives.
The data engineering manager will collaborate with internal data engineers, data scientists, and our data science implementation partners to understand the business requirements that drive Omnichannel functional, data, and technical solutions. The data engineer will be involved in the full technology life cycle development and support with responsibilities for designing, configuring, implementing, and supporting leading-edge data engineering pipelines that support decisions and drive organizational action for the Omnichannel brands.
The data engineering manager will also be responsible for understanding Janssen's existing data science, data engineering pipelines, and providing suggestions to optimize the architecture, providing recommendations on performance improvement. The data engineer will be responsible for supporting the data engineering platform (infrastructure, codebase, and data processing), working closely with internal and external partners, and managing Omnichannel stakeholders' (cross functional partners, sales, marketing, and IT) expectations. She/he will also be working closely with our customer master and sales data processing teams to meet machine learning and advanced analytic teams' data needs.
Major Duties & Responsibilities
Approximate Percentage of Time
Omnichannel Data Engineering & Data science platform support
- Understand Janssen commercial business unit’s Omnichannel data and functional requirements to enable and support data engineering pipelines.
- Act as a subject matter expert in Data Engineering technologies and bring best in class, innovative ideas to develop, test, and measure performance and impact of initiatives.
- Understand existing data engineering pipelines and data science models related to Janssen’s Omnichannel brands, provide inputs and suggestions to optimize the architecture, provide recommendations on performance improvement.
- Collaborate with ML factory model support partners to leverage ML-DevOps capabilities to operationalize ML models.
- Develop processes and tools to monitor and analyze model performance and data accuracy.
Drive the design and build of new data pipelines & feature engineering layer
- Apply data modeling, data engineering and feature engineering principles to support data science requirements and supply raw, curated, and processed data for machine learning engineers and data scientists.
- Work in cross-functional agile teams to continuously experiment, iterate, and deliver business goals and objectives.
- Collaborate with other data engineers, ML experts, and stakeholders from multiple therapeutic areas to take learnings and synergies as they arise.
- Lead the development and implementation of data engineering and feature engineering pipelines for predictive models and model tracking.
- Proactively identify new data sources that will enhance decision making & increase model accuracy.
Lead the design and implementation of a world-class data engineering platform
- Collaborate with data engineers and data scientists to build scalable data engineering and data science solutions leveraging the AWS platform (S3, EC2, EMR, Amazon Redshift), PySpark, Python, and Dataiku.
- Assist in developing architectural models for cloud-based data engineering solutions leveraging AWS technologies and PySpark to support large scale and high-performance data science and machine learning platforms.
- Foster strong partnerships in the context of resources, timing, and overall franchise goals.
- Work with partners to document known solutions to the internal and external knowledge base.
Innovation and leadership
- Provide thought leadership by researching best practices, conducting experiments, and collaborating with industry leaders.
- Work in cross-functional agile teams to implement POC, iterate, and deliver business goals and objectives.
- Foster Innovation via improving ML Model performance, efficiency and infrastructure, experimentation and testing.
Required Knowledge, Skills and Abilities:
(Include any required computer skills, certifications, licenses, languages, etc.)
- Bachelor’s degree in Computer Science, Computer Information Systems, Business Information Systems, or related discipline
- A minimum of 3-5 years’ experience in the healthcare/ Life sciences field is required.
- 5+ years’ experience as a solution architect, designing and implementing large-scale data engineering solutions in a fast-paced environment.
- 5+ years of experience in building data engineering pipelines in Cloud to support ML projects using PySpark and/or Python
- 5+ years of experience in data modeling, data access, and data storage techniques in the Cloud environment.
- 5+ years of experience in Python, SPARK, EMR, EC2, RedShift or similar technologies
- 3+ years of experience in collaborative data science platform like Dataiku
- Knowledge of commercial and channel Life Sciences data sets (LAAD, Speaker Program, Channel – Email, Non-Promotional Personal, CRM, Social Media, Doximity)
- Knowledge of Commercial Life Sciences data sets (NPP, DDD, Xponent, Plantrak, SPP)
- Diversity of experience is preferred, with demonstrated learning agility to move across different types of therapeutic areas such as Oncology, Immunology, Cardio Vascular Metabolism (CVM), ID, and Neuroscience.
- Experience in working in DevOps mode for data science and data engineering solutions
- Good understanding of Machine Learning algorithms in Sales & Marketing space
- Strong business and data analysis skills with a focus on commercial pharmaceutical operations
- Familiarity with how data scientists work and how DS/ML solutions can and should scale in production.
- Good track record of translating business requirements into technical designs for new technology solutions.
- Ability to provide data pipeline implementation guidance based on best practices throughout the life cycle of the project.
- Demonstrated good leadership capabilities through technology solution ownership and adoption.
- Understand the value of collaboration within teams, are excellent communicators, and build relationships with a diverse set of stakeholders.
- Demonstrated technical innovation and experimentation of the emergent solutions in alignment with project roadmap.
Preferred Knowledge, Skills and Abilities:
- Dataiku Certification
- Exposure to cloud technologies like Amazon Glue, Containers, Lambda functions, Serverless architecture