Engineering Manager, Evals

Anysphere, Inc

• $130K — $180K *

New York, NY 10025In-Person

Enterprise Technology

Less than 5 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of experience leading engineering teams in a production environment.
Proven skills in people leadership and team coaching.
Ability to integrate metrics and processes across research, product, and infrastructure.
In-depth knowledge of model and agent behaviors with a keen eye on industry trends.
Experience building evaluation or measurement systems, especially in AI or related fields.

Responsibilities

Set and manage the evaluation roadmap encompassing what to measure and its importance.
Lead a high-impact team in developing evaluation datasets and tools for engineers.
Enhance CursorBench to mirror real developer workflows and introduce new evaluations.
Define and implement online quality signals to guard against regressions.
Integrate evaluation metrics into decision-making for launches and model training.

Benefits

Flexible work environment in major cities like San Francisco or New York.
Opportunity to lead impactful projects that affect every Cursor product.
Collaboration with a diverse team of engineers and researchers.
Access to emerging research and trends in AI and engineering evaluation.
Professional growth opportunities within a supportive team structure.

Full Job Description

Engineering 3 Full-time 3 San Francisco; New York
Apply

About the Role

As an Engineering Manager on the Evals team at Cursor, you'll lead the group responsible for creating high-signal evaluation datasets for coding agents and building the tools engineers use to write and run them. The team also owns online evaluation systems that track agent quality in production, and the close integration between online and offline evaluations.

The evaluation systems that this team builds, including CursorBench , are critical in the development of our coding models and the quality of our Cursor agents . Your impact will compound across every Cursor product and every Cursor model by making quality measurable, comparable, and easy to improve.

What you'll do

Set the eval roadmap end-to-end-what we measure, why it matters, and how signals turn into shipping + training decisions.
Lead and grow a high-impact team of engineers and researchers building eval datasets and developer-friendly tools to write and run evals.
Guide the next generation of CursorBench so it continues to reflect real developer workflows at Cursor, and expand it with new evals that measure other properties developers value.
Define crisp online quality signals and turn regressions into robust guardrails.
Integrate evals into decision-making cadence for launches, deploys, and model training loops.

You may be a fit if

You've led engineering teams shipping production systems and have strong people leadership and coaching skills.
You can align research, product, data, and infrastructure on what 44good44 means-and turn that into durable metrics, processes, and release/training rituals.
You have good taste and strong opinions on model and agent behaviors, and you stay up-to-date on emerging research and industry trends.
You have strong data acumen, and can collaborate effectively with data scientists and researchers.
You've built and operated evaluation or measurement systems (e.g., AI evals, experimentation platforms, ranking/relevance, search quality, or reliability instrumentation).

#LI-DNI

* Ladders Estimates

Similar Jobs

MANAGER
$110K — $130K *
Naval Sea Systems Command
Newport, RI 02840 (Newport County)
Reposted Today
SUPERVISORY PROGRAM MANAGER
$110K — $130K *
Strategic Systems Programs
Washington, DC 20011 (District Of Columbia County)
Today
SUPERVISORY PROGRAM MANAGER
$110K — $130K *
Strategic Systems Programs
Washington Navy Yard, DC 20374 (District Of Columbia County)
Today
SUPERVISORY GENERAL ENGINEER
$100K — $130K *
Prescient Edge
Patuxent River, MD 20670 (Saint Marys County)
Today
Hardware Configuration Manager
$70K — $148K *
CACI International
Fort Belvoir, VA 22060 (Fairfax County)
Reposted Today
Configuration Manager
$70K — $148K *
CACI International
Dulles, VA 20101 (Loudoun County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Anysphere, Inc

Forward Deployed Engineer
$120K — $160K *
Remote
Yesterday
Enterprise Technology
Remote in New York, NY
Marketing Manager, Startup Events & Community
$90K — $130K *
San Francisco, CA 94112 (San Francisco County)
Yesterday
Business Services
In-Person
Regional Security Manager
$90K — $130K *
New York, NY 10025 (New York County)
3 days ago
Information Technology
In-Person
RVP, Customer Success, Strategic & Geo Enterprise (AMER)
$150K — $200K *
New York, NY 10025 (New York County)
3 days ago
Enterprise Technology
In-Person
Regional Director, Geo Enterprise
$150K — $200K *
San Francisco, CA 94112 (San Francisco County)
4 days ago
Enterprise Technology
In-Person

More Enterprise Technology Jobs

Manager, Enterprise Solutions Engineering
$206K — $269K *
Atlassian
Remote
Reposted Today
Account Associate, Mid-Market (Renewals)
$98K — $128K *
Atlassian
Remote
Today
Manager, Account Executives, Mid-Market West
$177K — $278K *
Atlassian
Remote
Reposted Today
Account Manager
$140K — $165K *
Applied Systems
Chicago, IL 60629 (Cook County)
Today
Director, Renewal Operations
$120K — $150K *
ConvergeOne
Atlanta, GA 30349 (Fulton County)
Today

Find similar Engineering Manager, Evals jobs:

Nationwide New York, NY

Engineering Manager, Evals

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Engineering Manager, Evals jobs:

Get Ready For Your
Next Interview