Member of Technical Staff, Model Evaluation

Mirendil

$350K — $500K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience in research engineering or a related field
  • Proficiency in building evaluation frameworks for AI models
  • Experience with automated data pipelines and regression testing
  • Strong background in observability tools and model behavior analysis
  • Familiarity with reinforcement learning (RL) principles and post-training evaluation methods

Responsibilities

  • Design and implement evaluation frameworks that capture model capabilities
  • Develop automated eval pipelines to detect regressions quickly
  • Create efficient workflows for human inspection of model behavior
  • Integrate observability tools into training runs for better insights
  • Collaborate with teams to align evaluation signals with training strategies

Benefits

  • Meaningful equity grant based on experience and background
  • Comprehensive health insurance options
  • Flexible vacation and paid time off policy
  • Opportunities for professional development and education
  • Collaborative work environment focused on cutting-edge AI research
Full Job Description
The Role

We are looking for a research engineer to build the evaluation infrastructure that tells us whether our models are getting better in ways we care about. You'll own the frameworks, pipelines, and tooling that measure model behavior across capabilities. Some example areas you might work on (not limited to):

  • Design and build evaluation frameworks that measure model capabilities along realistic axes, beyond standard benchmarks.
  • Build automated eval pipelines and regression-detection systems that run continuously and surface signal quickly.
  • Develop agent-assisted workflows for humans to efficiently inspect model behavior.
  • Instrument training runs with observability tooling so researchers can understand what's changing in model behavior, and why.
  • Partner with post-training and RL teams to close the loop between eval signal and training decisions.


If you're excited about the hard problem of knowing whether a frontier AI system is actually improving, we'd love to hear from you.

We offer a base salary of $350,000-$500,000 USD and a meaningful equity grant, depending on experience and background, along with competitive benefits.

Similar Jobs

More Jobs at Mirendil

More Information Technology Jobs

Find similar Member of Technical Staff, Model Evaluation jobs: