Platform Engineer, Model Shaping

Together AI

• $200K — $290K *

San Francisco, CA 94112In-Person

Information Technology

Less than 5 years of experience

Reposted Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

3+ years of experience in infrastructure or backend services
Familiar with Linux environments and container/orchestration tools (e.g., Docker, Kubernetes)
Proficient in Python or Go programming languages
Experience with automation tools (Terraform, Ansible) and CI/CD pipelines (GitHub Actions, ArgoCD)
Ability to analyze complex software systems and document findings
Experience with cloud environments (AWS/GCP/Azure), particularly hybrid solutions
Strong communication skills for cross-team collaboration and documentation

Responsibilities

Design and develop systems for model customization and internal improvements
Contribute to platform reliability and participate in on-call rotations
Create and enhance internal deployment and observability tools
Build a job orchestration platform across multiple data centers
Collaborate with internal teams to co-design integrated services

Benefits

Competitive compensation and startup equity
Health insurance and other health-related benefits
Flexibility in remote work arrangements

Full Job Description

About the Role

The Model Shaping team at Together AI works on products and research for tailoring open foundation models to downstream applications. We build services that allow machine learning developers to choose the best models for their tasks and further improve these models using domain-specific data. In addition to that, we develop new methods for more efficient model training and evaluation, drawing inspiration from a broad spectrum of ideas across machine learning, natural language processing, and ML systems.

As a Platform Engineer in Model Shaping, you will work at the intersection of backend engineering and infrastructure, building the foundational layers of Together's platform for model customization and evaluation. You will design, develop, and operate both the backend services and the underlying systems that enable us to sustainably and reliably scale production workflows launched by our users, as well as internal research experiments.

You will operate in a cross-functional environment, collaborating with other engineers and researchers in the team to improve the infrastructure based on the needs of projects they work on. You will also interact with other engineering teams at Together (such as Commerce, Data Engineering, and Cloud Infrastructure) to integrate the services developed by Model Shaping with systems developed by those teams.
Responsibilities

Design and build Together's systems and infrastructure for model customization, including user-facing features and internal improvements
Contribute to reliability improvements for the platform, participating in an on-call rotation and improving processes for incident response
Create and improve internal tooling for deployment, continuous integration, and observability
Build a job orchestration platform spanning multiple datacenters, supporting a highly heterogeneous hardware landscape
Partner with teams developing internal services, co-designing these services and incorporating them in systems built within Together

Requirements

3+ years of experience in building infrastructure or backend components of production services
Extensive experience designing, operating, and troubleshooting production Linux environments and Kubernetes-based platforms
Strong software engineering background in Python or Go
Experienced with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
Cloud environment (e.g., AWS/GCP/Azure) administration experience, preferably with a hybrid bare-metal/cloud environment
Strong communication skills, be willing to document systems and processes and collaborate with peers of varying technical expertise
Comfortable operating across the stack, from cluster operations and infrastructure automation to backend service development

Experience in any of the following will make you stand out:

Developing large-scale production systems with high reliability requirements
Pipeline orchestration frameworks (e.g., Kubeflow, Argo Workflows, Flyte)
Managing GPU workloads on HPC clusters, ideally with hands-on experience in operating NVIDIA's networking stack (e.g., NCCL, Mellanox firmware, GPUDirect RDMA)
Deployment of services for AI training or inference
Networking fundamentals, including TCP/IP, DNS, routing, load balancing, TLS, and network debugging tools
Maintaining or contributing to open-source projects

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is $200,000 - $290,000. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

* Ladders Estimates

Similar Jobs

Senior Systems Engineer
$190K — $210K *
GRVTY
Ventura, CA 93003 (Ventura County)
Today
Principal Platform Engineer
$96K — $207K *
Fifth Third Bancorp
Remote
Reposted Today
Senior Infrastructure Engineer - Certification Authority
$167K — $201K *
Fastly
San Francisco, CA 94112 (San Francisco County)
Today
Senior Client Platform Engineer
$165K — $206K *
Okta
San Francisco, CA 94112 (San Francisco County)
Reposted Today
System Design Engineer
$136K — $231K *
KLA Tencor
Milpitas, CA 95035 (Santa Clara County)
Today
Senior Reliability Engineer, DGX Cloud
$168K — $333K *
NVIDIA Corporation
Santa Clara, CA 95051 (Santa Clara County)
Today

Get Ready For Your
Next Interview

More Jobs at Together AI

Platform Engineer, Model Shaping
$200K — $290K *
San Francisco, CA 94112 (San Francisco County)
Reposted Today
Information Technology
In-Person
Workplace Coordinator
$100K — $140K *
San Francisco, CA 94112 (San Francisco County)
4 days ago
Business Services
In-Person
Staff Engineer, Distributed Storage and HPC & AI Infrastructure
$250K — $300K *
Remote
2 weeks ago
Information Technology
Remote in San Francisco, CA
Customer Support Engineer (Inference)
$160K — $230K *
San Francisco, CA 94112 (San Francisco County)
3 weeks ago
Enterprise Technology
In-Person
Senior Technical Recruiter, AI/ML Research
$165K — $210K *
San Francisco, CA 94112 (San Francisco County)
3 weeks ago
Technical Services
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
Marketing Programs Specialist III - US
$84K — $124K *
Rackspace Technology
San Jose, CA 95123 (Santa Clara County)
Today
Technical Business Analyst
$70K — $95K *
Jonas Software
Remote
Today
Technical/Functional Expert (Network)
$100K — $130K *
Columbia Technology Partners
Annapolis Junction, MD 20701 (Howard County)
Today
Software Support Engineer
$95K — $120K *
Kapsch TrafficCom AG
Remote
Today

Find similar Platform Engineer, Model Shaping jobs:

Nationwide San Francisco, CA

Platform Engineer, Model Shaping

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Platform Engineer, Model Shaping jobs:

Get Ready For Your
Next Interview