Senior Biological Data Architect

Rancho BioSciences

$145K — $187K *
US-AnywhereRemote in United States
Pharmaceuticals & Biotech
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • PhD in Life Sciences or equivalent experience in biomedical data.
  • Expertise in conceptual, logical, and canonical data modeling for complex biomedical domains.
  • Experience with schema modeling frameworks like LinkML.
  • Familiarity with YAML-based schema authoring.
  • Understanding of FAIR principles and persistent identifiers.
  • Knowledge of biomedical ontologies and controlled vocabularies.
  • Experience with semantic web technologies such as RDF and SPARQL.
  • Proficiency in Python, R, or SQL for data validation and conformance testing.

Responsibilities

  • Collaborate with stakeholders to develop comprehensive biomedical data models.
  • Lead harmonization efforts including vocabulary alignment and provenance capture.
  • Define schemas and controlled vocabularies in partnership with engineering teams.
  • Design models to support data pipelines and analytical workflows.
  • Establish data quality validation rules and checks.
  • Manage schema lifecycle including version control and documentation.
  • Facilitate schema review and identify modeling risks early.

Benefits

  • Fully remote work environment with flexible location options.
  • Opportunities to collaborate with leading pharmaceutical organizations.
  • Engagement in impactful projects that contribute to human health.
  • Work within a diverse team of experts across various scientific fields.
Full Job Description
About the role

  • We are seeking a full-time contractor for a Senior Biological Data Architect to design, harmonize, and govern complex biomedical data models on behalf of our pharmaceutical, academic, and institutional clients. The successful candidate will be an expert problem solver with deep expertise in conceptual, logical, and canonical data modeling for biomedical and scientific domains, including disease biology, genetics, translational research, and drug development. You will play a central role in client initiatives that deliver FAIR-aligned data products enabling rapid query and decision-making by R&D scientists.
  • We are a Data Curation company collaborating with some of the most renowned pharmaceutical organizations in the world. Our team of scientists, curators, computational biologists, data scientists, knowledge engineers, and solution developers is distributed across the country; we support talented people living where they choose, working collaboratively on projects that have real impact on human health.
  • While fully remote, candidates will be expected to spend the majority of time overlapping East Coast US or UK working hours.

What you'll do

  • Partner with scientific and technical stakeholders to elicit requirements and propose canonical data models that represent the full breadth of biomedical concepts relevant to target discovery, disease understanding, and translational research, along with the evidence and provenance that support them.
  • Design and lead source-to-canonical harmonization activities, covering vocabulary alignment, persistent identifier assignment, and lineage and provenance capture.
  • Define schemas, controlled vocabularies, identifier strategies, and ontology bindings in collaboration with knowledge engineering, curation, data engineering, and platform teams.
  • Design models that power data pipelines, APIs, knowledge graphs, analytical workflows, and downstream R&D query use cases.
  • Establish validation rules and data quality checks covering ontology term validation, range and cardinality checks, required-field enforcement, ID and label consistency, cross-field consistency, and provenance completeness.
  • Manage the full schema lifecycle: repository management (e.g., GitHub-based), semantic versioning, changelogs, tagged releases, data dictionaries, metadata catalogs, and downstream impact assessments.
  • Drive schema review, approval, and publication processes; identify modeling risks early, such as metadata gaps, ontology conflicts, source data quality issues, lineage gaps, and compatibility risks.
  • Lead modeling strategy spanning harmonization, pipeline validation, knowledge graphs, and FAIR data product delivery.
  • Translate ambiguous scientific requirements into clear, durable canonical models and make defensible, documented decisions on ontology reuse, extension, and mapping.
  • Design modular, reusable, future-proof models aligned with FAIR and enterprise standards, with consistent persistent identifier and provenance conventions across data assets.
  • Communicate strategies, trade-offs, and progress clearly to clients and internal teams.

Qualifications

Required:

  • PhD in Life Sciences (or equivalent demonstrated expertise) with first-hand experience working with biomedical or research data.
  • Strong conceptual, logical, and canonical data modeling experience for complex scientific or biomedical domains.
  • Hands-on experience with LinkML or equivalent schema modeling frameworks, comfortable defining classes, slots, ranges, identifiers, required fields, constraints, cardinality, descriptions, and ontology bindings.
  • Working knowledge of YAML-based schema authoring.
  • Solid grasp of FAIR principles (findability, accessibility, interoperability, reusability), including persistent identifiers, metadata standards, provenance, and schema versioning.
  • Experience with biomedical ontologies and controlled vocabularies, including familiarity with public ontology resources covering genes, diseases, phenotypes, anatomy, cell types, assays, units, and evidence.
  • Familiarity with semantic web technologies such as RDF, OWL, JSON-LD, SHACL, ShEx, and SPARQL, and with knowledge graph modeling.
  • Proven experience designing Entity Relationship Diagrams and Conceptual and Logical Data Models.
  • Experience with schema and model registries, data catalogs, metadata registries, and data dictionary management.
  • Proficiency in Python, R, or SQL for model conformance testing, ontology mapping, or data quality validation (notebook-based workflows a plus).
  • Experience with SDLC methodologies, unit and integration testing, and documentation practices.
  • AI awareness: comfortable evaluating how and AI-driven curation and mapping tools can accelerate modeling, harmonization, and validation workflows.

Nice to Have:

  • Experience working with modern cloud data platforms and data lake environments such as Snowflake or Databricks.
  • Hands-on use of AI-powered coding assistants and established collaboration workflows that incorporate them into day-to-day modeling, documentation, or validation work.


The pay range for this role is:

70 - 90 USD per hour (United States)

Similar Jobs

More Jobs at Rancho BioSciences

More Pharmaceuticals & Biotech Jobs

Find similar Senior Biological Data Architect jobs: