Research Scientist - Vision-Language Modeling

Epsilon Health

• $130K — $180K *

San Francisco, CA 94112In-Person

Healthcare

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

6+ years of experience in vision-language modeling or multimodal learning
Expertise in training large VLMs (e.g., LLaVA, Flamingo)
Strong background in post-training techniques like DPO and RLHF
Proven ability to adapt complex models for new applications
Proficiency in PyTorch or JAX, experience with distributed training
Experience with medical imaging for radiology report generation
Solid software engineering skills for production-quality code

Responsibilities

Design and train multimodal foundation models for radiology
Implement advanced post-training strategies to enhance model accuracy
Research inference-time compute scaling techniques for diagnostic performance
Develop capabilities for grounded report generation in medical imaging
Create evaluation frameworks for assessing medical text quality
Engage in all aspects of model development from curation to deployment
Stay updated on research in vision-language modeling and medical AI

Benefits

Opportunity to work with one of the largest medical imaging datasets
Collaborative environment fostering research and technical excellence
Engagement with cutting-edge AI developments in healthcare
Contribution to publications and best practices in the field
Focus on impactful work in clinical settings

Full Job Description

Role Overview

We're seeking a Research Scientist with deep expertise in Vision Language Modeling (VLMs) to join our ML team. You'll be at the forefront of developing and deploying state-of-the-art multimodal models for clinical use in radiology settings. This role focuses on training and fine-tuning vision-language models (VLMs) that can generate accurate & grounded radiology reports across multiple imaging modalities including X-rays, CT scans, and MRI. You'll work with one of the largest and most diverse medical imaging datasets in the industry, advancing the state-of-the-art in grounded medical report generation, model alignment, and inference-time reasoning while maintaining the clinical rigor required for healthcare deployment.

Key Responsibilities

Design, train, and scale vision-language foundation models for radiology applications.
Develop and implement advanced post-training strategies including preference optimization (DPO, IPO, KTO), reinforcement learning from human feedback (RLHF), and other alignment techniques to improve clinical accuracy and reduce hallucinations.
Research and deploy inference-time compute scaling techniques such as chain-of-thought reasoning, self-refinement, and test-time training to enhance model performance on complex diagnostic cases.
Pioneer grounded report generation capabilities, enabling models to spatially localize findings within medical images using bounding boxes or segmentation masks.
Design rigorous evaluation frameworks that assess text for medical accuracy and writing style.
Contribute hands-on to all stages of model development including dataset curation, architecture design, distributed training, post-training optimization, and production deployment.
Stay current with cutting-edge research in vision-language modeling, medical AI, and model alignment techniques.
Drive research and technical excellence through conference publications and technical blog posts, establishing best practices for training robust medical VLMs at scale.

Qualifications

6+ years of academia/industry experience in vision-language modeling, multimodal learning, or related fields
Deep expertise in training and fine-tuning large vision-language models (e.g., LLaVA, Flamingo, CogVLM, Qwen-VL, or similar architectures)
Strong foundation in modern post-training techniques including:
- Preference optimization methods (DPO, IPO, ORPO, KTO)
- RLHF and reward modeling
- Inference-time compute scaling and reasoning strategies
- Constitutional AI and other alignment techniques
Track record of implementing complex models from research papers and adapting them to new domains
Proficiency in PyTorch or JAX, with experience training large models on multi-GPU/distributed systems
Experience with autoregressive language modeling and instruction tuning
Hands-on experience with medical imaging applications, particularly radiology report generation
Strong software engineering skills and ability to write production-quality code

Preferred Qualifications

Publications at top-tier conferences (NeurIPS, ICML, ICLR, CVPR, ACL, EMNLP, MICCAI)
Experience with grounded generation tasks (visual grounding, referring expression comprehension)
Knowledge of evaluation methodologies for long-form generation, including factuality assessment and hallucination detection
Experience with 3D medical image processing and temporal modeling
Familiarity with clinical NLP and medical knowledge representation
Experience with model interpretability, explainability, and uncertainty quantification in safety-critical applications

* Ladders Estimates

Similar Jobs

Governance Researcher (Expression of Interest)
$100K — $150K *
Apollo Research
San Francisco, CA 94112 (San Francisco County)
Today
Supervisory Marine Biologist
$100K — $130K *
National Oceanic and Atmospheric Administration (NOAA)
Sacramento, CA 95823 (Sacramento County)
Today
Hunyuan AIGC Algorithm Researcher (World Model Foundation Direction)
$134K — $253K *
LightSpeed Retail
Palo Alto, CA 94303 (Santa Clara County)
Reposted Today
Senior Research Associate- Diagnostic Assay Development
$97K — $132K *
10X Genomics
Pleasanton, CA 94566 (Alameda County)
Reposted Today
Member of Technical Staff - Research
$130K — $180K *
Trajectory
San Francisco, CA 94112 (San Francisco County)
Today
Founding Machine Learning Scientist
$130K — $180K *
Tabula
San Francisco, CA 94112 (San Francisco County)
Yesterday

Get Ready For Your
Next Interview

More Jobs at Epsilon Health

Research Scientist - Vision-Language Modeling
$130K — $180K *
San Francisco, CA 94112 (San Francisco County)
Today
Healthcare
In-Person
Research Scientist - Computer Vision
$120K — $160K *
San Francisco, CA 94112 (San Francisco County)
Yesterday
Healthcare
In-Person
Research Engineer - ML Infrastructure
$130K — $180K *
San Francisco, CA 94112 (San Francisco County)
Yesterday
Information Technology
In-Person

More Healthcare Jobs

Certified Case Management for Trauma Manager
Confidential Company
Houston, TX 77096 (Harris County)
Today
Executive Director, Facilities Operations
$150K — $170K *
The Vernon Staffing Group
Cleveland, OH 44106 (Cuyahoga County)
Reposted 1 week ago
Licensed Therapist
Small Joys
Remote
Reposted 1 week ago
Client Partner, Disability Management
$80K — $116K *
Fraser Health
Surrey, BC V3R 0B3
Today
Registered Nurse, Medicine, Ridge Meadows Hospital
$86K — $123K *
Fraser Health
Maple Ridge, BC V2W 0A1
Today

Find similar Research Scientist - Vision-Language Modeling jobs:

Nationwide San Francisco, CA

Research Scientist - Vision-Language Modeling

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Research Scientist - Vision-Language Modeling jobs:

Get Ready For Your
Next Interview