Software Engineer, ML Platform

Xaira Therapeutics

$140K — $215K *
Pharmaceuticals & Biotech
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Degree in Computer Science, Machine Learning, Computational Biology, or a related field.
  • 5+ years of experience building and deploying ML systems in production environments.
  • Experience leading technical projects and cross-functional execution.
  • Strong programming skills in Python.
  • Familiarity with infrastructure/ops tools like Terraform and Ansible.
  • Experience with deep learning frameworks such as Torch and Jax.
  • Strong problem-solving skills in a collaborative environment.

Responsibilities

  • Develop and improve the model training system for distributed training jobs across multiple clouds.
  • Deploy storage subsystems to enhance dataset management and throughput.
  • Build evaluation infrastructure for easy execution and tracking.
  • Create tooling for integrating model training with internal systems like telemetry and experiment tracking.
  • Facilitate job dispatching to multiple clusters with robust API development.

Benefits

  • Competitive compensation package including health benefits.
  • Open, flexible, and friendly work environment.
  • Opportunities for career development and growth.
  • Access to biological labs for hands-on experience.
Full Job Description
About the RoleWe are seeking a Software Engineer to join our Platform team to design, build, and deploy the AI infrastructure that powers our world-class research team. In this role, you'll collaborate closely with AI Scientists and other engineers to enable the effective use of thousands of GPUs for training and inferencing cutting-edge biological foundation models.

This role spans a range of problems and skillsets, ranging from MLOps of cutting-edge GPU clusters, to backend engineering of control plane APIs. Our ideal candidate has an opinion about slurm or kubernetes for model training, cares about maximizing bandwidth from the storage subsystems to the GPU, and can build the API paved path for submitting training jobs that are able to dispatch to multiple clusters.

What You Will Do
  • Develop and improve our model training system, responsible for dispatching distributed training jobs to clusters across multiple clouds.
  • Deploy storage subsystems that improve dataset management and throughput for training datasets.
  • Build evaluation infrastructure that enables easy execution and tracking.
  • Build base tooling for integrating model training with other internal infrastructure, such as telemetry, experiment tracking, and checkpointing.
  • Prior experience with biology is not required - we will teach what you need to know. You'll get to go in the lab, and for our ideal candidate, this should be a perk, not a chore!

Preferred Skills and Qualifications
  • Degree in Computer Science, Machine Learning, Computational Biology, or a related field.
  • 5+ years of industry experience building and deploying ML systems in production environments
  • Experience leading technical projects and driving cross-functional execution.
  • Strong programming skills in Python.
  • Experience with infrastructure/ops tools such as Terraform, Ansible.
  • Experience with deep learning frameworks such as Torch, Jax.
  • Solid understanding of machine learning
  • Experience with the infrastructure needs of large-scale model training.
  • Strong problem-solving skills and ability to work in a collaborative, multidisciplinary environment.

Compensation

We offer a competitive compensation and benefits package, seeking to provide an open, flexible, and friendly work environment to empower employees and provide them with a platform to develop their long-term careers. A Summary of Benefits is available for all applicants. We offer a competitive package that includes base salary, bonus, and equity. The base pay range for this position is expected to be $140,000 - $215,000 annually; however, the base pay offered may vary depending on the market, job-related knowledge, skills and capabilities, and experience.

Similar Jobs

More Jobs at Xaira Therapeutics

  • Software Engineer, ML Platform
    $140K — $215K *
    Seattle, WA 98115 (King County)
    Pharmaceuticals & Biotech
    In-Person
  • Software Engineer, X-Scientist
    $140K — $215K *
    South San Francisco, CA 94080 (San Mateo County)
    Pharmaceuticals & Biotech
    In-Person
  • Senior Automation Engineer
    $152K — $190K *
    South San Francisco, CA 94080 (San Mateo County)
    Pharmaceuticals & Biotech
    In-Person
  • Contracts Manager
    $124K — $155K *
    South San Francisco, CA 94080 (San Mateo County)
    Pharmaceuticals & Biotech
    In-Person
  • Accounting Manager
    $135K — $155K *
    South San Francisco, CA 94080 (San Mateo County)
    Pharmaceuticals & Biotech
    In-Person

More Pharmaceuticals & Biotech Jobs

Find similar Software Engineer, ML Platform jobs: