Senior Data Scientist

Probably Genetic

$130K — $180K *
Healthcare
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 7+ years of experience in data science or machine learning engineering
  • Strong proficiency in Python and core data science tools like pandas, NumPy, scikit-learn, PySpark, and SQL
  • Demonstrated experience in end-to-end machine learning processes from problem definition to monitoring
  • Familiarity with NLP techniques and practical application of language models
  • Comfortable with prompt engineering and external AI API performance evaluation
  • Ability to operate with high ownership in fast-paced, lean environments
  • Strong analytical communication skills for translating complex data insights to diverse audiences

Responsibilities

  • Own the complete development and deployment of PG's predictive diagnostic AI models
  • Run prospective testing experiments to continually enhance model performance
  • Build and maintain a synthetic patient data pipeline for research and model development
  • Optimize patient intake experience using NLP and data analysis
  • Manage API usage and cost optimization across PG's AI ecosystem
  • Conduct strategic analyses to inform product and program insights
  • Establish MLOps infrastructure for model monitoring and operational processes
  • Engage in blue sky initiatives focused on extracting value from data

Benefits

  • An engaging and supportive team focused on improving lives
  • Fair compensation with competitive early-stage equity grants
  • Generous flexible time off policy that is actively utilized
  • 12 weeks of parental leave for all eligible employees
  • Hybrid, flexible work environment promoting autonomy
  • A pet-friendly office located in Downtown SF near transit
  • Work from anywhere policy allowing up to 4 weeks per year
  • Regular team retreats to exciting destinations
  • Comprehensive health benefits including medical, dental, vision, therapy, FSA, and 401k
Full Job Description
About the role

We are looking for a Senior Data Scientist who will own some of the most consequential diagnostic AI in rare disease: building, validating, and operationalizing the models that help us find and diagnose patients who have never had a name for their disease, powering the analytical rigor behind our testing programs, and shaping how we use data to make smarter product decisions.

What you will do
  • Own the end-to-end development, validation, and operationalization of PG's predictive diagnostic AI models - from feature engineering through production deployment - that power program eligibility decisions and clinical decisions for patients
  • Run prospective testing experiments: apply diagnostic models to undiagnosed patients, coordinate testing, and track outcomes to continuously improve model performance
  • Build and maintain PG's synthetic patient data pipeline, a critical deliverable for our research programs, and key input to our own model development lifecycle
  • Optimize our patient intake experience using NLP and multimodal data analysis to determine which questions to ask, in what order, to maximize data quality and conversion
  • Own API usage and cost optimization across PG's AI stack, including prompt engineering, model evaluation, and ongoing performance monitoring
  • Conduct ad hoc strategic analyses that inform product prioritization, causality assessment, and generate customer-facing program insights
  • Establish MLOps infrastructure: model monitoring, drift detection, API observability, and lightweight but durable operational processes
  • Have the freedom to conduct blue sky research initiatives aimed at creating value from our data
  • Work with Data Engineering to build a robust, scalable data foundation that supports all of the above
Who you are

We are looking for a few specific things that will help you succeed in this role:
  • 7+ years of experience in data science, machine learning engineering, or a closely related field
  • Strong Python proficiency and fluency across the core data science stack: pandas, NumPy, scikit-learn, PySpark, and SQL
  • Demonstrated end-to-end ML experience: you have taken models from problem definition through feature engineering, validation, deployment, and monitoring in a production environment
  • Experience with NLP techniques and applying language models to real-world problems
  • Comfort with prompt engineering and evaluating external AI API performance (e.g., OpenAI)
  • A track record of operating with high ownership in lean, fast-moving environments where you have had to build structure as much as execute within it
  • Strong analytical communication skills - you can translate complex model outputs and data findings into clear, actionable narratives for technical and non-technical audiences alike


Some things that are not required, but you will learn on the job:
  • Experience with Databricks or similar lakehouse/ML platform environments
  • Familiarity with synthetic data generation techniques
  • Domain knowledge in healthcare, rare disease, genomics, or clinical research
  • Experience with MLOps tooling and building observability infrastructure from scratch
  • Exposure to biopharma or insurance analytics use cases


What we offer at Probably Genetic:
  • An engaging and supportive team all on a mission to improve lives
  • Fair and equitable compensation with competitive early-stage equity grants
  • Generous Flexible Time off policy, that we actually use
  • Parental Leave Benefits (12 weeks for both birthing and non-birthing)
  • Hybrid, flexible work with high-trust and autonomy
  • A bright, inviting, pet-friendly office in Downtown SF near transit
  • A "work from anywhere" policy, up to 4 weeks a year
  • Regular team retreats in exciting destinations
  • Health Benefits including medical, dental, vision, therapy, FSA, and 401k
  • And so much more!

Similar Jobs

More Jobs at Probably Genetic

More Healthcare Jobs

Find similar Senior Data Scientist jobs: