Role OverviewWe're looking for a technical, systems-minded operator to build and scale the evaluation engine behind Harvey's platform. As we expand globally, ensuring our models behave reliably, accurately, and jurisdictionally correctly is mission-critical-and evaluation complexity is increasing 10x.
As a member of our Product Operations team, you'll work closely with Applied Legal Researchers, Product, Engineering, AI Research, and human data providers to operationalize evaluation methodologies and embed them into our product development lifecycle. You'll create the workflows, systems, and tooling that make evaluation a first-class product capability at Harvey.
This is a high-ownership role for someone who thrives in ambiguity, loves building structure, and wants to help scale the evaluation infrastructure of a global AI company.
What You'll Do- Build and scale the systems that power model and product evaluations across Harvey
- Run intake, triage, and prioritization for the evaluation request queue, routing capacity to the highest-value coverage gaps
- Embed evaluation workflows and readiness checkpoints into the product development lifecycle
- Create the single source of truth for evaluation status, results, history, and launch readiness
- Turn Expert-designed evaluation methodologies into scalable, repeatable operational processes
- Manage human data providers and stand up our internal contract-attorney pipeline, ensuring evaluation quality meets legal standards
- Work with Engineering and Research to improve evaluation tooling, automation, and dashboards
- Drive evaluation readiness for major product and model launches across geographies and jurisdictions
- Document and operationalize evaluation governance as complexity increases
- Help define how Harvey ensures model accuracy, reliability, and trust at global scale
What You Have- 4-7+ years in technical program management, product operations, research operations, or evaluation/benchmarking roles
- Experience working with ML/AI evaluations, benchmarking frameworks, or scientific workflows
- Comfort with statistical methodologies and SQL or Python, or similar tools to interpret evaluation data (either natively or with AI tool support)
- Strong business acumen with an ability to apply an ROI-focused mindset to scaling
- Ability to work deeply with legal experts and operationalize complex evaluation methodologies
- Strong cross-functional coordination skills across Product, Engineering, Research, and data providers/vendors
- High attention to detail and a bias toward clarity, rigor, and reproducibility
- Ability to navigate an evolving landscape and bring order to complex systems
- Strong communication skills and comfort translating technical nuance for diverse stakeholders
- Desire to do whatever it takes to make evaluation systems successful-from writing documentation to diagnosing pipeline issues
Depending on your location, an Applicant Privacy Notice may apply to you. You can find all of our Applicant Privacy Notices [here].#LI-SB1