PositionManifold's AI research runs on a shared, scaled compute platform built on AWS EKS, Ray, and Kubernetes. Today it supports 25+ users across the company with secure, centralized access to data and democratized GPU access - nearly all of our AI research runs here, along with a large share of our bioinformatics work, including hit calling and data ETL pipelines.
As we scale mBER and our broader model development toward proteome-scale design, we're looking for an engineer to own and evolve this platform. You'll take full ownership of our scaled computational infrastructure - security, uptime, and cost - while developing a deep enough understanding of the models we deploy to drive runtime optimization and quality-of-life features that make our scientists faster. You'll also build stable infrastructure for deploying custom agentic workflows internally, working hands-on with agentic AI tooling for fast iteration.
This is an on-site role and can be based in either Boston, Massachusetts or San Francisco, California. Please only apply if you reside in these cities or are open to relocate.Responsibilities- Own and develop Manifold's EKS-based compute platform to meet the shifting needs of our computational sub-teams - mBER development and production runs, LLM fine-tuning, novel binder design research, and more
- Monitor AWS compute costs and implement optimizations that reduce spend while supporting continued growth
- Run and optimize production models (mBER, folding models, and other generative models) for fast iteration and a consistent library design cycle
- Improve security, uptime, and cross-region access across the compute stack, hardening infrastructure against external threats
- Establish CI/CD practices (likely GitOps) and clear, comprehensive cost-tracking
- Build and maintain platforms for agentic automation and custom internal agentic workflows
- Help define the data handoff from AI generation to Snowflake + Benchling and connect to experimental readouts
Required Qualifications- Strong, ML-specific coding skills in PyTorch and/or JAX, with the ability to quickly prototype, test, and debug
- Strong familiarity with AWS, especially EC2, EKS, networking, and storage solutions
- Experience optimizing GPU-heavy computational workloads
- Strong security practices and experience hardening web applications and infrastructure against external attackers
- Deep integration with agentic AI development tools
- Experience building and working with relational databases
- Ability to move fast - standing up prototypes and iterating in production with a diverse user base
- Strong data science and analysis skills
- Interest in bio-specific ML, with a background in physical or natural sciences
Preferred Qualifications- Track record of advanced automation using agentic AI tooling
- Experience with transformer architectures or graph neural networks for molecular data
- Published research in ML, computational biology, or protein design
- Knowledge of protein engineering, directed evolution, or structural biology wet lab techniques
- Previous biotech/pharma industry experience
This Role Might Be Perfect For You If- You want to own the compute backbone that powers an entire AI-driven drug discovery platform
- You like working close to the models - not just keeping infrastructure alive, but making it faster and cheaper so scientists can move
- You're energized by agentic AI tooling and want to build the platforms that let a team deploy it at scale
- You have rich ML infrastructure / MLOps experience and are excited to bring it into biotech
If you're excited to build and scale the infrastructure that powers protein foundation model development, please reach out to [redacted].