Senior AI Agent & Evaluations Engineer

Vacatia

• $120K — $150K *

Portland, OR 97229In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years of experience in developing AI agents or LLM-driven systems in production environments.
In-depth knowledge of prompt engineering, including system prompts and tool integration.
Experience in creating evaluation frameworks with golden datasets and regression testing methodologies.
Familiarity with AI development tools like Claude Code, Codex, or similar agents.
Experience with evaluation platforms such as LangSmith or Arize.
Strong analytical skills to differentiate between prompt issues and model failures.
Excellent communication skills for collaboration across technical and business teams.

Responsibilities

Design and optimize AI agent prompts and decision-making behaviors.
Develop and maintain comprehensive evaluation frameworks and testing pipelines.
Create safety mechanisms to ensure responsible agent operations.
Monitor production performance and implement data-driven improvements.
Translate business policies into measurable agent behaviors in collaboration with stakeholders.
Work with engineering teams to establish context and integration requirements for agents.
Develop reusable frameworks for deploying AI agents across various workflows.

Benefits

Hybrid work model with three days in-office attendance.
Remote work considered for exceptional candidates.
Opportunity to work at the forefront of applied AI innovation.
Collaborative work environment with talented professionals.
Chance to impact business outcomes with AI-driven solutions.

Full Job Description

Location: Portland, OR (Hybrid - Three Days In Office)
Remote considered for exceptional candidates.

We're looking for a hands-on Senior AI Agent & Evals Engineer to own the intelligence layer behind these systems. You'll be responsible for designing agent behavior, building evaluation frameworks, creating guardrails, and continuously improving agent performance as our AI footprint expands across the organization.

If you're passionate about prompt engineering, agent reliability, and creating measurable AI systems that solve meaningful business problems, we'd love to meet you.

Your Impact
Design, refine, and optimize prompts, tool definitions, routing logic, and decision-making behavior across Vacatia's AI agent ecosystem

Build and maintain evaluation frameworks, golden datasets, grading systems, and regression testing pipelines that measure agent quality and reliability

Develop guardrails and safe-failure mechanisms that ensure agents operate responsibly in customer-facing and financially sensitive workflows

Monitor production performance, investigate failures, identify edge cases, and continuously improve agent outcomes through data-driven iteration

Partner with business stakeholders to translate policies, operational requirements, and domain expertise into measurable agent behavior

Collaborate with engineering teams to define context requirements, tool contracts, and integration specifications that support agent success

Create scalable frameworks and reusable patterns for deploying AI agents across new business workflows and use cases

Establish best practices for prompt engineering, evaluation methodologies, observability, and agent operations

What You Bring
Proven experience shipping and owning production AI agents or LLM-powered systems beyond proof-of-concept environments

Deep expertise in prompt engineering, including system prompts, tool usage, context management, output constraints, and agent behavior design

Hands-on experience building evaluation frameworks using golden datasets, scoring rubrics, LLM-as-judge methodologies, and regression testing

Strong familiarity with modern AI development tools such as Claude Code, Codex, or similar coding agents

Experience with agent observability and evaluation platforms such as LangSmith, Langfuse, Arize, Galileo, or comparable solutions

Ability to distinguish prompt issues from data, tooling, model, or evaluation failures and systematically improve agent performance

Strong written and verbal communication skills with the ability to work effectively across engineering and business teams

Demonstrated ownership mindset with a passion for building reliable, measurable, and continuously improving AI systems

Strongly Preferred
Experience building agents that process communication-based workflows including emails, support tickets, chat interactions, or transcripts

Experience with multiple agent frameworks and a practical understanding of their tradeoffs

Familiarity with the evolving LLM landscape and model selection strategies

Experience designing and implementing end-to-end evaluation pipelines and agent operations workflows

Production experience with online evaluation systems and automated scoring of live traffic

Nice to Have
Experience integrating AI systems with Salesforce, AWS Connect, or customer engagement platforms

Background in customer-facing industries where accuracy, compliance, and communication quality are critical

Contributions to open-source projects, technical writing, or public thought leadership in AI, prompt engineering, or agent development

Join Us
Join us at the forefront of applied AI innovation. If you're excited about building intelligent systems that solve complex business problems, improving agent behavior through rigorous evaluation, and helping shape the future of vacation ownership, we'd love to hear from you.

At Vacatia, you'll have the opportunity to build AI solutions that matter, work alongside talented teammates, and create technology that drives real business impact.

* Ladders Estimates

Similar Jobs

Forward Deployed Engineer (FDE) Sr Analyst
$58K — $156K *
Accenture
Seattle, WA 98115 (King County)
Reposted Today
Forward Deployed Engineer (FDE) Sr Analyst
$58K — $156K *
Accenture
Kirkland, WA 98034 (King County)
Reposted Today
AI Algorithm Engineer
$118K — $188K *
Intel
Hillsboro, OR 97124 (Washington County)
Today
Staff AI Engineer - Remote - USA
$120K — $160K *
FullStack Labs
Remote
Reposted Today
AI Software Engineer - Remote
$120K — $150K *
Azumo
Remote
Reposted Today
AI Game Designer | North America | Canada | Europe | Fully Remote
$80K — $120K *
Escape Velocity Entertainment Inc
Remote
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Vacatia

Senior AI Agent & Evaluations Engineer
$120K — $150K *
Portland, OR 97229 (Washington County)
Today
Information Technology
In-Person
Senior Manager, Lead Marketing
$90K — $120K *
Kissimmee, FL 34744 (Osceola County)
6 days ago
Business Services
In-Person
Sales Manager, Virtual Sales
$80K — $120K *
Remote
1 week ago
Hospitality & Recreation
Remote in United States
Director Field Operations
$90K — $120K *
Williamsburg, VA 23185 (James City County)
1 week ago
Real Estate & Construction
In-Person
Sales Manager (Rentals)
$70K — $95K *
Las Vegas, NV 89110 (Clark County)
3 weeks ago
Hospitality & Recreation
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
Yesterday
Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
2 weeks ago
Lead AI Platform Engineer
$139K — $163K *
U.S. Bank
Hopkins, MN 55343 (Hennepin County)
Today
ServiceNow Developer II
$99K — $104K *
SCCU
Melbourne, FL 32935 (Brevard County)
Today
IT Business Analyst
$75K — $95K *
Roots
Toronto, ON M3C 0E3
Today

Find similar Senior AI Agent & Evaluations Engineer jobs:

Nationwide Portland, OR

Senior AI Agent & Evaluations Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior AI Agent & Evaluations Engineer jobs:

Get Ready For Your
Next Interview