Member of Technical Staff, Evals

Magic AI Inc.

• $200K — $500K+*

San Francisco, CA 94112In-Person

Consumer Technology

Less than 5 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Strong software engineering fundamentals in a relevant field.
Experience in building production systems or developer infrastructure.
Exceptional attention to detail and a commitment to measurement accuracy.
Familiarity with machine learning systems and evaluation frameworks.
Ability to critically evaluate benchmarks and experimental methodologies.
Experience designing and operating large-scale systems.
Strong debugging skills and investigations of complex issues.

Responsibilities

Build and maintain the internal evaluation platform for Magic's teams.
Design and validate evaluation tasks for various system stages.
Develop infrastructure for conducting large-scale evaluations.
Implement systems to measure and improve dataset quality.
Enhance correctness and reliability of evaluation processes.
Audit and refine public benchmarks and evaluation methodologies.
Collaborate with teams to define metrics reflecting model quality.

Benefits

401(k) plan with 6% salary matching.
Comprehensive health, dental, and vision insurance for employees and dependents.
Unlimited paid time off for work-life balance.
Visa sponsorship and relocation assistance offered.
Opportunity to work in a small, dynamic team on advanced AI systems.

Full Job Description

About the role

Evals builds the internal platform that teams across Magic use to evaluate the performance of first-party and third-party models. The team supports pre-training, post-training, data, inference, and product, and sits on the critical path of many of the company's most important decisions.

As a Member of Technical Staff on Evals, you will build both the platform and the evaluations themselves. You'll develop infrastructure for large-scale evaluations, data ablations, and dataset quality analysis, while designing and validating the methodologies used to measure model performance.

Sweating the details matters on this team. Many benchmarks, papers, and open-source evaluation frameworks contain subtle bugs or flawed assumptions that lead to misleading conclusions. We care deeply about correctness, reproducibility, and measurement quality.

Evals are essential to the success of the company. By building trustworthy evaluation systems, you will help Magic make better research decisions, build better datasets, and ship better products.

What you'll work on

Build and maintain the internal evals platform used across Magic
Design, implement, and validate eval tasks for pre-training, post-training, reinforcement learning, inference, and product systems
Develop infrastructure for running large-scale evaluations
Build systems to measure dataset quality and identify opportunities to improve training data
Improve evaluation correctness, reproducibility, and reliability
Audit and improve upon public benchmarks, evaluation methodologies, and open-source implementations
Partner with research, data, inference, and product teams to define metrics that accurately reflect model quality
Build tooling and frameworks that enable teams across Magic to make decisions based on trustworthy measurements

What we're looking for

Strong software engineering fundamentals
Experience building production systems, internal platforms, or developer infrastructure
Exceptional attention to detail and a high bar for correctness
Experience working with machine learning systems, evaluation frameworks, data infrastructure, or research tooling
Ability to reason critically about benchmarks, metrics, and experimental methodology
Strong intuition for measurement quality and experimental design
Experience designing, implementing, or operating systems that run at scale
Strong debugging and investigative skills
Comfortable navigating ambiguity and determining whether a measurement is actually capturing the behavior it claims to measure
Skepticism toward results that cannot be reproduced, validated, or explained
Track record of owning technical projects end-to-end
Excitement about helping researchers and engineers make better decisions through trustworthy measurements

Compensation, benefits, and perks (US)

Annual salary range between $200K - $550K depending on experience
Equity is a significant part of total compensation, in addition to salary
401(k) plan with 6% salary matching
Generous health, dental, and vision insurance for you and your dependents
Unlimited paid time off
Visa sponsorship and relocation support for candidates moving to San Francisco
A small, fast-moving, highly collaborative team working on frontier AI systems

* Ladders Estimates

Similar Jobs

Software Engineer, Machine Learning Infrastructure - Generative AI
$137K — $299K *
DoorDash
San Francisco, CA 94112 (San Francisco County)
Reposted Today
Software Engineer, Machine Learning Infrastructure - Generative AI
$137K — $299K *
DoorDash
Sunnyvale, CA 94087 (Santa Clara County)
Today
Software Development Engineer, Kiro
$165K — $223K *
Amazon
Santa Clara, CA 95051 (Santa Clara County)
Today
Software Engineer - Network (C++)
$180K — $440K *
xAI
Palo Alto, CA 94303 (Santa Clara County)
Today
Software Engineer 3, Voyage Control Plane
$106K — $209K *
MongoDB
Palo Alto, CA 94303 (Santa Clara County)
Today
SLAM Engineer, Calibration, Mapping & Localization
$168K — $247K *
DoorDash
Remote
Today

Get Ready For Your
Next Interview

More Jobs at Magic AI Inc.

Member of Technical Staff, Evals
$200K — $500K+*
San Francisco, CA 94112 (San Francisco County)
1 month ago
Consumer Technology
In-Person

More Consumer Technology Jobs

Product Manager
$77K — $111K *
Wiley
St. Louis Park, MN 55436 (Hennepin County)
Today
Sr. Manager, SEO and AEO
$130K — $160K *
Skechers USA, Inc
Manhattan Beach, CA 90266 (Los Angeles County)
Reposted Today
Analyst, Ecommerce Experience
$85K — $120K *
Skechers USA, Inc
Manhattan Beach, CA 90266 (Los Angeles County)
Reposted Today
Product Manager, Marketing Automation
$110K — $150K *
Skechers USA, Inc
Manhattan Beach, CA 90266 (Los Angeles County)
Reposted Today
Principal Design Engineer - AI Agentic
$171K — $375K *
Seattle, WA 98115 (King County)
Reposted Today

Find similar Member of Technical Staff, Evals jobs:

Nationwide San Francisco, CA

Member of Technical Staff, Evals

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Member of Technical Staff, Evals jobs:

Get Ready For Your
Next Interview