Member of Technical Staff, Model Evaluation

Mirendil

• $350K — $500K *

San Francisco, CA 94112In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of experience in research engineering or a related field
Proficiency in building evaluation frameworks for AI models
Experience with automated data pipelines and regression testing
Strong background in observability tools and model behavior analysis
Familiarity with reinforcement learning (RL) principles and post-training evaluation methods

Responsibilities

Design and implement evaluation frameworks that capture model capabilities
Develop automated eval pipelines to detect regressions quickly
Create efficient workflows for human inspection of model behavior
Integrate observability tools into training runs for better insights
Collaborate with teams to align evaluation signals with training strategies

Benefits

Meaningful equity grant based on experience and background
Comprehensive health insurance options
Flexible vacation and paid time off policy
Opportunities for professional development and education
Collaborative work environment focused on cutting-edge AI research

Full Job Description

The Role

We are looking for a research engineer to build the evaluation infrastructure that tells us whether our models are getting better in ways we care about. You'll own the frameworks, pipelines, and tooling that measure model behavior across capabilities. Some example areas you might work on (not limited to):

Design and build evaluation frameworks that measure model capabilities along realistic axes, beyond standard benchmarks.
Build automated eval pipelines and regression-detection systems that run continuously and surface signal quickly.
Develop agent-assisted workflows for humans to efficiently inspect model behavior.
Instrument training runs with observability tooling so researchers can understand what's changing in model behavior, and why.
Partner with post-training and RL teams to close the loop between eval signal and training decisions.

If you're excited about the hard problem of knowing whether a frontier AI system is actually improving, we'd love to hear from you.

We offer a base salary of $350,000-$500,000 USD and a meaningful equity grant, depending on experience and background, along with competitive benefits.

* Ladders Estimates

Similar Jobs

Member of Technical Staff, Post-Training, RL
$350K — $500K *
Mirendil
San Francisco, CA 94112 (San Francisco County)
Today
Research Scientist 5 — Content Representation Models (CRM)
$466K — $500K+*
Netflix
Los Gatos, CA 95032 (Santa Clara County)
3 days ago
Multimodal LLM Researcher
$300K — $400K *
DEEPREC.AI
Palo Alto, CA 94303 (Santa Clara County)
3 weeks ago
Researcher, Context - Agent Post-Training
$250K — $380K *
OpenAI
San Francisco, CA 94112 (San Francisco County)
1 month ago
Research Engineer 5 - LLM-Driven Product Understanding
$466K — $500K+*
Netflix
Remote
1 month ago

Get Ready For Your
Next Interview

More Jobs at Mirendil

Member of Technical Staff, Kernels
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Member of Technical Staff, Generalist Systems Engineer
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person
Member of Technical Staff, Agent Harness
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Enterprise Technology
In-Person
Member of Technical Staff, Enterprise Platform Engineer
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Enterprise Technology
In-Person
Member of Technical Staff, Security Engineer
$350K — $500K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
Senior Cloud Engineer
$100K — $130K *
Teledyne FLIR LLC
Huntsville, AL 35810 (Madison County)
Today
Senior Network Engineer
$100K — $130K *
Providence Equity Partners LLC
Boston, MA 02115 (Suffolk County)
Today
Systems Operations Manager – Data Platforms -Teradata & Hadoop
$100K — $130K *
Wells Fargo
Charlotte, NC 28269 (Mecklenburg County)
Reposted Today
Senior Software Engineer (001996)
$100K — $130K *
Wells Fargo
Chandler, AZ 85225 (Maricopa County)
Today

Find similar Member of Technical Staff, Model Evaluation jobs:

Nationwide San Francisco, CA

Member of Technical Staff, Model Evaluation

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Member of Technical Staff, Model Evaluation jobs:

Get Ready For Your
Next Interview