CVS Health's Analytics & Behavior Change (A&BC) team is an organization working to solve some of the most challenging problems at the intersection of technology and healthcare. A&BC leverages advanced analytics, clinical informatics, and hypothesis-driven approaches to transform data into actionable, customer-centric insights that drive growth, improve health outcomes, and expand access to healthcare across all CVS Health businesses. Our teams build next-generation data and AI products that help power CVS Health to make healthier happen for 100+ million customers.
The A&BC organization is looking to grow its Clinical Data Science & AI team. Join us as we embark on an exciting journey to drive a transformational shift in how CVS Health leverages clinical data and analytics to become the leader in consumer healthcare in the U.S.
As aData Scientist - ClinicalAI, you are tasked with activating CVS Health's clinical data repository to improve outcomes across multiple lines of business and use cases. You will serve as a bridge between clinical data assets and the analysts, data scientists, and business partners who consume them—ensuring data is accessible, well-documented, fit for purpose, and aligned with clinical and regulatory standards.
Extract signal from unstructured clinical text.Apply NLP and language model techniques to clinical notes, CCD documents, and other free-text clinical data to generate structured, actionable features for downstream analytics and predictive models.
Build and fine-tune Small Language Models (SLMs).Design, train, and evaluate domain-specific SLMs tailored to clinical use cases — balancing performance, cost, latency, and compliance requirements.
UtilizeLLMs where applicable.Leverage large language models where they add clear value (e.g., training data creation, entity extraction, zero-shot classification) while knowing when traditional ML, rules-based approaches, or simpler statistical methods are the right tool for the job.
Develop predictive analytics solutions.Build and validate predictive models using both classical ML (gradient boosting, logistic regression, survival analysis) and modern deep learning approaches to support clinical decision-making and population health initiatives.
Conduct rigorous Exploratory Data Analysis (EDA).Deeply explore clinical datasets — structured and unstructured — to uncover patterns, assess data quality, identify feature candidates, and inform modeling strategy before jumping to solutions.
Communicate findings clearly.Present methodology, results, and recommendations to technical and non-technical stakeholders through well-crafted visualizations, notebooks, and presentations. Translate complex AI/ML concepts into language that clinical and business partners can act on.
Collaborate across teams.Work withmachine learning engineers, data engineers, clinical informaticists, and business partners to ensure clinical data pipelines support AI/ML workflows and that model outputs are integrated into products and decision-making processes.
Stay current and stay curious.Continuously evaluate emerging techniques in NLP, foundation models, and clinical AI. Bring new ideas to the team, prototype rapidly, and advocate for approaches grounded in evidence rather than hype.
Uphold data governance standards.Ensure all work complies with HIPAA, data privacy regulations, and internal data stewardship policies, particularly when handling PHI and unstructured clinical text.
Hands-on experience with NLP text preprocessing, tokenization, named entity recognition (NER), text classification, topic modeling, or similar techniques applied to real-world unstructured data.
Practical experience with LLMs and/or SLMs prompt engineering, fine-tuning, RAG architectures, evaluation frameworks, or deploying language models in production or research settings.
Strong foundation in traditional machine learning supervised and unsupervised methods, feature engineering, model selection, cross-validation, and performance evaluation.
Best coding practices 6 you use version control (Git/Github), commit your work regularly, write clean and reproducible code, and understand that well-organized repositories are as important as well-build models.
Deep EDA skills 6 ability to systematically explore datasets, identify data quality issues, surface insights, and make informed decisions about modeling approach before writing a single line of model code.
Proficiency in Python (pandas, scikit-learn, PyTorch or TensorFlow, Hugging Face Transformers) and SQL for working with large-scale healthcare datasets.
Experience with cloud-based data and ML platforms, preferably Google Cloud Platform (GCP) 6 BigQuery, Vertex AI, or equivalent.
Judgment and common sense 6 you understand that not every problem needs an LLM, you meet your deadlines, you ask for help when you're stuck, and you don't over-engineer solutions.
A genuine curiosity and desire to learn 6 you read papers, you try new tools, you ask "why," and you're energized by problems you haven't solved before.You know whena rabbit holeis worthdiving into andwhen to pullback, stay focused, anddeliver.
Experience working with clinical text data clinical notes, discharge summaries, pathology reports, or similar unstructured healthcare documents.
Knowledge of clinical coding systems and terminologies (ICD-10, SNOMED-CT, LOINC, RxNorm, CPT, NDC, UMLS) and their relevance to NLP pipelines.
Familiarity with clinical data standards (HL7, FHIR, CCD/C-CDA) and common data models (e.g., OMOP).
Experience building or contributing to clinical NLP pipelines entity extraction, relation extraction, negation detection, or section segmentation from clinical narratives.
Familiarity with MLOps practices model versioning, experiment tracking, CI/CD for ML, model monitoring.
Experience working directly with clinical stakeholders (physicians, nurses, clinical operation teams, etc) and tailoring presentations, findings, and recommendations to the appropriate audience level 6 from executive summaries for leadership to detailed methodology reviews for technical notes.
Privacy, security, and compliance experience: HIPAA/HITRUST, de-identification/tokenization, PHI handling.
Bachelors degree in health informatics,biostatistics, computer science, data science mathematics, biomedical informatics, or relatedor an equivalent combination of formal education and experience.
Master's degree or higherin Health Informatics, Biomedical Informatics, Clinical Informatics, Public Health, Epidemiology,Data Scienceor a related field isa plus 6 but not a substitute for demonstrated ability to ship real-world solutions
Clinical background (RN, PharmD, MD, or similar) with transition intodata scienceor AI is a genuine differentiate for this role.
Anticipated Weekly Hours
40
Time Type
Full time
Pay Range
The typical pay range for this role is:
$79,310.00 - $158,620.00
This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above.
Great benefits for great people
We take pride in offering a comprehensive and competitive mix of pay and benefits that reflects our commitment to our colleagues and their families.
This fulltime position is eligible for a comprehensive benefits package designed to support the physical, emotional, and financial wellbeing of colleagues and their families. The benefits for this position include medical, dental, and vision coverage, paid time off, retirement savings options, wellness programs, and other resources, based on eligibility.
Additional details about available benefits are provided during the application process and on .
We anticipate the application window for this opening will close on: 07/31/2026