The
Data Scientist is responsible for turning unstructured healthcare data into standardized, analysis-ready assets that power Life Sciences and RWE use cases. Working within the internal data platform team, this role designs and evaluates AI and agent-based methods to extract clinical meaning from free-text sources (for example, deriving patient clinical statuses from notes and mapping free-text diagnoses to standardized code sets such as ICD-10), rigorously measures the quality of those outputs, and builds agentic tooling that lets users query data and generate dashboards through natural language. The role blends applied data science, large language model (LLM) evaluation, and platform engineering to make high-value insights accessible and trustworthy at scale.
Key Responsibilities:- Design and build methods to standardize unstructured healthcare data, including extracting clinical statuses and other structured attributes from free-text clinical notes and mapping free-text diagnoses to standardized terminologies such as ICD-10.
- Develop, prompt, and orchestrate LLM- and agent-based pipelines to perform extraction, normalization, and enrichment tasks against diverse and messy source documents.
- Define and implement rigorous evaluation frameworks to measure agent and model performance on these tasks, including gold-standard datasets, accuracy and error-mode analysis, and ongoing monitoring of output quality.
- Build and maintain an agentic harness that enables users to query data and generate dashboards through natural language (an internal analog to agentic analysis tools such as Claude Cowork for our data).
- Partner with data platform, engineering, and Life Sciences stakeholders to translate business use cases into technical requirements and analysis-ready data products.
- Continuously monitor and improve the accuracy, quality, and efficiency of data extraction, standardization, and downstream analytics.
- Contribute to responsible AI practices, including appropriate handling and de-identification of sensitive healthcare data.
- Perform other job duties as assigned.
Required Qualifications: - Bachelor's degree in a related field or equivalent work experience
- 2 - 4 years related work experience
Preferred Qualifications: - Healthcare industry experience, especially with Healthcare data in post-acute care.
- Experience applying large language models and/or agentic frameworks to real-world data extraction or transformation tasks.
- Experience with clinical terminologies and coding systems (e.g., ICD-10, SNOMED CT, LOINC) and with natural language processing on clinical or biomedical text.
- Familiarity with Life Sciences or Real-World Data / Real-World Evidence (RWD/RWE) use cases.
- Experience designing evaluation methodologies for AI/ML or LLM systems (accuracy metrics, annotation, error analysis).
- Proficiency in Python and SQL, and experience working within a modern data platform or cloud environment.
- Experience building data applications and pipelines.
Job Expectations:- Willing to work additional or irregular hours as needed
- Must work in accordance with applicable security policies and procedures to safeguard company and client information
- Must be able to sit and view a computer screen for extended periods of time
#LI-TC1
#LI-Onsite
Here are some of the exciting benefits full-time teammates are eligible to receive at WellSky:
- Excellent medical with Rx, dental, and vision benefits
- Mental Health support through EAP
- Generous paid time off, plus 13 paid holidays
- 100% vested 401(K) retirement plans
- Educational assistance up to $2500 per year