Staff Applied Scientist - AI Evaluation & Trust

Sayari

• $195K — $205K *

US-AnywhereRemote in United States

Information Technology

8 - 10 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

10+ years of Machine Learning experience, focusing on Deep Neural Networks and model evaluation.
1-2 years of experience in post-training activities.
1+ year experience creating benchmarks for evaluating LLMs.
Deep expertise in LLM-as-judge architectures, multi-turn evaluation, and Reinforcement Learning.
Mastery of statistics and experimental design, including significance testing and inter-rater reliability.
Experience with Mixture-of-Experts (MoE) systems and expert specialization.
Builder mindset with a proven ability to take projects from data collection to production deployment.
Understanding of Graph RAG and challenges in evaluating agentic workflows.

Responsibilities

Lead the development of specialized judge models for evaluation and failure mode detection.
Design and execute scoring pipelines and calibrations for agentic systems.
Establish evaluation frameworks to assess performance against human expert standards.
Own the lifecycle of evaluation data and deploy services into production.
Research advanced techniques in Mixture-of-Experts routing and ensemble calibration.
Collaborate with cross-functional teams to translate statistical uncertainty into product signals.
Act as a technical leader and advocate for rigorous empirical validation in the AI pod.

Benefits

100% fully paid medical, vision, and dental for employees and their dependents.
Generous time off, including 18 PTO days, 10 sick days, and observation of all US federal holidays.
Strong commitment to diversity, equity, and inclusion.
Eligibility for additional benefits, including 401k matching and paid life insurance.
Collaborative and positive workplace culture with high-caliber colleagues.
Limitless growth and learning opportunities.

Full Job Description

POSITION DESCRIPTION

Sayari builds AI systems for high-consequence analytical work where being "wrong" carries real-world weight. We are looking for a Staff or Principal Applied Scientist to join our AI Innovation Group as the trusted expert on AI Evaluation and Trust. You will own the "Judgment Layer" of our system: building the specialized judge models, statistical benchmarks, and multi-turn frameworks that ensure our agents act with the high bar of trustworthiness required by our national security and enterprise customers.

JOB RESPONSIBILITIES

Lead the development of specialized "judge models," moving from general-purpose frontier models to architectures purpose-built for evaluation and failure mode detection.
Design and execute rigorous scoring pipelines and empirical threshold calibrations for agentic systems, including multi-turn conversation and Graph RAG reasoning.
Establish domain-specific evaluation frameworks that measure whether a system can perform the work of human experts rather than just passing general capability benchmarks.
Own the full lifecycle of evaluation data, from designing annotation infrastructure and protocols to deploying evaluation services into production.
Research and implement advanced techniques in Mixture-of-Experts (MoE) routing, expert specialization evaluation, and ensemble calibration.
Collaborate cross-functionally with Product, Data Engineering, and the SVP of AI to translate complex statistical uncertainty into clear, actionable product signals.
Act as a technical leader and "Scientific Conscience" within the AI pod, ensuring every AI-driven risk signal is backed by an empirical derivation story.

SKILLS & EXPERIENCE

Required:

10+ years of Machine Learning experience with a focus on Deep Neural Network activities, evaluating model performance & trust.
1-2+ years' experience focused on post-training activities
1+ year experience creating benchmarks to evaluate LLMs
Technical Mastery: Deep expertise in LLM-as-judge architectures, multi-turn evaluation, and Reinforcement Learning (RL/RLHF/RLAIF).
Statistical Rigor: Mastery of statistics and experimental design, including significance testing, distribution analysis, and inter-rater reliability.
Architectural Depth: Experience with Mixture-of-Experts (MoE) systems, routing behavior, and expert specialization.
Builder Mindset: Proven ability to own the path from data collection to production deployment; we are a small team and every role is "hands-on."
Domain Fluency: Understanding of Graph RAG and the unique challenges of evaluating non-deterministic, agentic workflows.

Preferred:

Judgment Task Models: Experience building, fine-tuning (LoRA, etc.), or pre-training models specifically for judgment, preference modeling, or classification tasks.
Domain Context: Background in cognitive science, intelligence community tradecraft, or research literature on expert judgment under uncertainty.
Infrastructure at Scale: Experience building or managing large-scale annotation infrastructure and quality assurance protocols.
Academic/Research Track Record: A record of published research or recognized work in preference modeling or AI alignment.

The target base salary for this position is $195,000-$205,000 plus company bonus and equity. Final offer amounts are determined by multiple factors including location, local market variances, candidate experience and expertise, internal peer equity, and may vary from the amounts listed above.

Benefits:

100% fully paid medical, vision, and dental for employees and their dependents
Generous time off; we observe all US federal holidays, close our office for a winter break (12/24-12/31), in addition to granting 18 PTO days and 10 sick days
Outstanding compensation package; competitive commissions for revenue roles and bonuses for non-revenue positions
A strong commitment to diversity, equity, and inclusion
Eligibility to participate in additional benefits such as 401k match up to 5%, 100% paid life insurance (up to $100,000 coverage),, and parental leave
A collaborative and positive culture - your team will be as smart and driven as you
Limitless growth and learning opportunities

Pay Range

$195,000-$205,000 USD

* Ladders Estimates

Similar Jobs

Staff Applied Scientist
$169K — $252K *
Afresh Technologies
Remote
2 days ago
Staff ML Engineer, Fine Tuning - Slack
$197K — $313K *
Salesforce
Seattle, WA 98115 (King County)
Reposted 5 days ago
Staff ML Engineer, Fine Tuning - Slack
$197K — $313K *
Salesforce
Washington, DC 20011 (District Of Columbia County)
Reposted 5 days ago
Staff ML Engineer, Fine Tuning - Slack
$197K — $313K *
Salesforce
Atlanta, GA 30349 (Fulton County)
Reposted 5 days ago
Staff ML Engineer, Fine Tuning - Slack
$197K — $313K *
Salesforce
San Francisco, CA 94112 (San Francisco County)
Reposted 5 days ago
ML Ops Lead
$195K — $220K *
FutureFit AI
Remote
6 days ago

Get Ready For Your
Next Interview

More Jobs at Sayari

Delivery Consultant (Remote, US)
$100K — $120K *
Remote
1 week ago
Enterprise Technology
Remote in United States
Product Enablement Business Partner
$135K — $165K *
Remote
2 weeks ago
Business Services
Remote in United States
Sr. Accountant
$112K — $137K *
Remote
3 weeks ago
Legal & Accounting
Remote in United States
Director of Digital Demand Gen
$180K — $195K *
Remote
1 month ago
Business Services
Remote in United States
Account Manager, Government (TREAS and DOE)
$160K — $175K *
Washington, DC 20011 (District Of Columbia County)
1 month ago
Education, Government & Non-Profit
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Senior Manager, Software Engineering, Full Stack
$209K — $238K *
Capital One Financial Corporation
Plano, TX 75025 (Collin County)
Today
Applications Development Technology Lead Analyst
$96K — $145K *
Citigroup, Inc
Tampa, FL 33647 (Hillsborough County)
Today
Cloud DevOps Analyst
$70K — $95K *
Gateway Ticketing Systems
Gilbertsville, PA 19525 (Montgomery County)
Reposted Today
Hybrid Cloud Platform Engineer (PaaS)
$100K — $130K *
Abile Group, Inc.
Springfield, VA 22153 (Fairfax County)
Today

Find similar Staff Applied Scientist - AI Evaluation & Trust jobs:

Nationwide Remote

Staff Applied Scientist - AI Evaluation & Trust

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff Applied Scientist - AI Evaluation & Trust jobs:

Get Ready For Your
Next Interview