We are seeking a highly specialized Applied AI Engineer Clinical Informatician to lead research at the intersection of completed clinical trial datasets and biobank-linked population data. This is fundamentally a hands-on research role (not operational trial management), where you will be an individual contributor. Your core mission is to build the systems and tools that extract, define, and contextualize patient phenotypes from locked trial databases, real-world data, and biobank cohorts, that will turn archived data that can generate translational insight that shapes the next generation of clinical research.
You will work with rich, already-collected datasets: locked trial databases, archived omics profiles, longitudinal electronic health records, and population-scale biobank cohorts. Your mandate is to build the AI and ML systems that make these datasets manageable and ready for detailed analysis. This role suits someone who thinks like a scientist, builds like an engineer, and communicates like a clinician.
Key ResponsibilitiesAI & Machine Learning for Translational Discovery
- Develop and deploy agentic AI applications that enable natural language interaction with clinical data
- Ground AI outputs in validated biological knowledge, for example implementing RAG pipelines anchored in biomedical ontologies (HPO, Gene Ontology, MeSH, DrugBank), clinical trial registries, and curated pathway databases
- Deploy unsupervised and self-supervised learning approaches like clustering, representation learning, contrastive learning to discover latent patient archetypes and molecular disease subtypes across trial and biobank data
- Deploy survival models and dynamic treatment regime estimators using combined clinical and omics features
- AI tooling to harmonize heterogeneous trial and biobank datasets to common data representations
- Evaluate and monitor model performance, safety, and reliability in production environments
- Manage vendors and contractors as well as partner relationships with relevant teams across Lilly
Post-Trial Data Research & Analysis
- Building pipelines for locked clinical trial databases (SDTM, ADaM) to conduct secondary and exploratory research beyond primary endpoints
- Deploy ML workflows to identify trial subgroup effects, treatment heterogeneity, and responder/non-responder signatures from completed trial data
- Mine adverse event narratives, clinical notes, and investigator comments using NLP to surface latent safety signals not captured in structured endpoints in biobanks and clinical datasets
- Reconstruct patient-level longitudinal trajectories from trial visit data to model disease progression, drug response kinetics, and time-to-event outcomes
- Architect workflows for meta-analytic and cross-trial integrative analyses across multiple completed studies to identify generalizable biological and clinical patterns
- Build connections to large-scale biobank cohorts (UK Biobank, All of Us, etc.) as external validation and enrichment resources for trial-derived findings for clinical phenotypes
Research Rigor, Reproducibility & Governance
- Establish research data management practices ensuring full reproducibility of analyses including data versioning, containerized compute environments, and audit-ready analysis logs
- Ensure all research activities follow HIPAA, GDPR, and relevant IRB and ethics committee requirements
Basic Qualifications- M.S. in Biomedical Informatics, Computational Biology, Bioinformatics, Statistical Genetics , Epidemiology, or a closely related quantitative field or an MD/PhD with equivalent depth in translational data science with 6+ years of research experience working with clinical trial datasets (SDTM/ADaM), biobank data, or large-scale population health data in an academic, pharmaceutical, or research institute setting
- Or Ph.D. in Biomedical Informatics, Computational Biology, Bioinformatics, Statistical Genetics, Epidemiology, or a closely related quantitative field or an MD/PhD with equivalent depth in translational data science with 3+ years of research experience working with clinical trial datasets (SDTM/ADaM), biobank data, or large-scale population health data in an academic, pharmaceutical, or research institute setting
Additional Skills & Preferences- Demonstrated use of AI tools in production environments for clinical data analysis
- Expert proficiency in Python and/or R for statistical modelling and ML; strong command of SQL and experience with cloud-based research computing environments (ideally DNAnexus, AWS, GCP, Azure, or HPC clusters)
- Familiar with advanced generative AI methods like finetuning of LLMs. Building and training foundation models from scratch. High performance computing environments
- Deep knowledge of CDISC standards (SDTM, ADaM) and experience analyzing clinical trial databases for secondary research purposes
- Demonstrated experience applying ML methods including survival analysis, causal inference, NLP, and deep learning to clinical or genomic research questions
- Thorough understanding of OMOP CDM, HL7 FHIR Genomics, and major biomedical ontologies
- Direct research experience with major public and restricted-access biobank resources (UK Biobank, All of Us, etc.)
- Experience with federated learning, differential privacy, or secure computation frameworks applied to multi-site biomedical research
- Track record of peer-reviewed publications in clinical AI, translational informatics, genomics, or a related field
- Familiarity with the target trial framework and its application in biobanks
- Knowledge of pharmacogenomics, drug response modeling, or PK/PD data analysis from clinical trials
- Experience with knowledge graph construction, graph ML, or ontology-driven reasoning for biomedical discovery
- Hands-on experience with multi-omic data analysis
Actual compensation will depend on a candidate's education, experience, skills, and geographic location. The anticipated wage for this position is
$181,500 - $283,800
Full-time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance). In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company-sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities).Lilly reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion and Lilly's compensation practices and guidelines will apply regarding the details of any promotion or transfer of Lilly employees.
#WeAreLilly