Member of the Technical Staff, Biological Data

Output • $100K — $150K *

US-AnywhereRemote in New York, NY

Pharmaceuticals & Biotech

Less than 5 years of experience

3 weeks ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

PhD in computational biology, biophysics, structural biology, chemistry, biochemistry, or related field with 2+ years of relevant experience.
Deep understanding of molecular interactions and protein structure.
Experience with large-scale biological datasets, including sourcing and analysis.
Strong programming skills in Python for data processing.
Knowledge of machine learning data requirements: quality, coverage, and evaluation.
Approach data construction as a research problem, analyzing its significance and gaps.

Responsibilities

Own and construct high-quality datasets for model learning based on molecular interactions.
Develop methods to augment training data using biological insights and reasoning.
Design biological benchmarks to evaluate model capabilities meaningfully.
Collaborate with researchers to establish data-driven learning strategies for models.
Integrate diverse data sources across biological scales into coherent training sets.
Implement rigorous evaluation strategies to ensure model generalization and prevent data leakage.
Stay updated on biological data sources and methods to continuously enhance training data.

Benefits

Encouragement of new ideas and contrarian thinking.
Feedback-focused environment promoting growth and development.
Autonomy in day-to-day management with a focus on achieving milestones.
Excellent medical, dental, and vision coverage.

Full Job Description

The Role

You will own the data that our models learn from. This role requires a deep understanding of molecular biology - what a biological data source contains, what it implies, and what is missing. The quality and coverage of training data determines what our models can learn, and the biological insight behind how that data is constructed is the difference between a model that memorizes and one that reasons.

You will construct training datasets that capture how proteins and molecules interact, drawing from diverse biological data sources and extending them with your understanding of molecular principles
You will develop methods to expand training data beyond what exists in public databases, using biological and chemical reasoning to create new training signal where current data is sparse or absent
You will design benchmarks grounded in real molecular phenomena, measuring whether our models have learned biologically meaningful capabilities rather than statistical shortcuts
You will develop data strategies in collaboration with model researchers, determining what the model should learn from, what biological signal to prioritize, and how to sequence learning across modalities
You will design approaches for integrating data across biological scales and modalities, building coherent training data from heterogeneous experimental and computational sources
You will design rigorous splitting and evaluation strategies that prevent leakage and ensure model capabilities generalize to real biological problems
You will stay current with biological data sources, experimental methods, and molecular databases, continuously identifying new sources of training signal

About You

You have a PhD in computational biology, biophysics, structural biology, chemistry, biochemistry, or a related biological field with 2+ years of post-doctoral or industry research experience, or equivalent depth through a combined biology and computational background
You have deep understanding of molecular interactions, protein structure, and biological data at the molecular level, grounded in first principles rather than surface familiarity
You have experience working with large-scale biological or molecular datasets, including sourcing, cleaning, integrating, and analyzing heterogeneous data
You have strong programming skills in Python and are comfortable building computational pipelines for data processing at scale
You understand what machine learning models require from training data: coverage, quality, balance, and evaluation rigor
You approach data construction as a research problem, not a pipeline task: you think carefully about what data means, what signal it carries, and what is absent

Bonus Points

You have experience with computational biology tools such as structure prediction, molecular docking, or virtual screening
You have experience training or evaluating machine learning models, particularly on molecular or biological data
You have publications in computational biology, bioinformatics, or molecular informatics
You have a background in cheminformatics or molecular data analysis
You have experience working with protein or molecular language models

What We Offer

We encourage new and different ideas, creativity and contrarian thinking
Healthy feedback focused environment to help you strive - leadership will have high expectations, regularly share constructive feedback, support you and help you grow, and welcome receiving feedback and ideas from you
You own your day-to-day management. What we care about is that we all hit our milestones
Competitive salary and equity in a growing, well-funded startup
Excellent medical, dental, and vision coverage

About Output

Learn more about Output

Industry

Media

* Ladders Estimates

Similar Jobs

Staff Scientist in Translational Bioinformatics
$101K — $129K *
Children's Hospital of Philadelphia
Philadelphia, PA 19120 (Philadelphia County)
Today
Bioinformatics Scientist
$90K — $120K *
Axle Informatics
Research Triangle Park, NC 27709 (Durham County)
Today
Bioinformatics Scientist
$90K — $120K *
Axle Informatics
Research Triangle Park, NC 27709 (Durham County)
Today
Computational Scientist
$90K — $120K *
Stowers Institute
Kansas City, MO 64118 (Clay County)
Reposted Today
Biologist
$85K — $141K *
Guidehouse
Bethesda, MD 20817 (Montgomery County)
Today
Sr. Data Scientist - Process Modeling
$134K — $181K *
Amgen Inc
Thousand Oaks, CA 91360 (Ventura County)
Today

Get Ready For Your
Next Interview

More Jobs at Output

Head of Discovery
$150K — $200K *
New York, NY 10025 (New York County)
3 weeks ago
Pharmaceuticals & Biotech
In-Person
Head of Discovery
$130K — $180K *
Remote
3 weeks ago
Pharmaceuticals & Biotech
Remote in New York, NY
Member of the Technical Staff, Cheminformatics
$120K — $150K *
Remote
3 weeks ago
Pharmaceuticals & Biotech
Remote in New York, NY
Member of the Technical Staff, Cheminformatics
$120K — $150K *
New York, NY 10025 (New York County)
3 weeks ago
Pharmaceuticals & Biotech
In-Person
Member of the Technical Staff, Pretraining
$130K — $180K *
New York, NY 10025 (New York County)
3 weeks ago
Pharmaceuticals & Biotech
In-Person

More Pharmaceuticals & Biotech Jobs

Contract Scientific Recruiter (San Carlos, CA)
$190K — $253K *
BeiGene, Ltd.
San Carlos, CA 94070 (San Mateo County)
Today
Principal Engineer, DP Packaging Equipment
$142K — $256K *
Moderna, Inc.
Norwood, MA 02062 (Norfolk County)
Today
Senior Manager FP&A
$122K — $153K *
ACADIA Pharmaceuticals
San Diego, CA 92154 (San Diego County)
Today
Sr. Mgr. Manufacturing
$154K — $208K *
Amgen Inc
West Greenwich, RI 02817 (Kent County)
Today
Medical Science Liason/Sr. Medical Science Liason- Upper Midwest
$225K — $250K *
BridgeBio Pharma, Inc.
Remote
Today

Find similar Member of the Technical Staff, Biological Data jobs:

Nationwide Remote

Member of the Technical Staff, Biological Data

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Member of the Technical Staff, Biological Data jobs:

Get Ready For Your
Next Interview