About the RoleWe are seeking a Software Engineer to join our Platform team to design, build, and deploy the AI infrastructure that powers our world-class research team. In this role, you'll collaborate closely with AI Scientists and other engineers to enable the effective use of thousands of GPUs for training and inferencing cutting-edge biological foundation models.
This role spans a range of problems and skillsets, ranging from MLOps of cutting-edge GPU clusters, to backend engineering of control plane APIs. Our ideal candidate has an opinion about slurm or kubernetes for model training, cares about maximizing bandwidth from the storage subsystems to the GPU, and can build the API paved path for submitting training jobs that are able to dispatch to multiple clusters.
What You Will Do- Develop and improve our model training system, responsible for dispatching distributed training jobs to clusters across multiple clouds.
- Deploy storage subsystems that improve dataset management and throughput for training datasets.
- Build evaluation infrastructure that enables easy execution and tracking.
- Build base tooling for integrating model training with other internal infrastructure, such as telemetry, experiment tracking, and checkpointing.
- Prior experience with biology is not required - we will teach what you need to know. You'll get to go in the lab, and for our ideal candidate, this should be a perk, not a chore!
Preferred Skills and Qualifications- Degree in Computer Science, Machine Learning, Computational Biology, or a related field.
- 5+ years of industry experience building and deploying ML systems in production environments
- Experience leading technical projects and driving cross-functional execution.
- Strong programming skills in Python.
- Experience with infrastructure/ops tools such as Terraform, Ansible.
- Experience with deep learning frameworks such as Torch, Jax.
- Solid understanding of machine learning
- Experience with the infrastructure needs of large-scale model training.
- Strong problem-solving skills and ability to work in a collaborative, multidisciplinary environment.
CompensationWe offer a competitive compensation and benefits package, seeking to provide an open, flexible, and friendly work environment to empower employees and provide them with a platform to develop their long-term careers. A Summary of Benefits is available for all applicants. We offer a competitive package that includes base salary, bonus, and equity. The base pay range for this position is expected to be $140,000 - $215,000 annually; however, the base pay offered may vary depending on the market, job-related knowledge, skills and capabilities, and experience.