Principal AI Researcher (Agentic Systems & AI Infrastructure)

Trase Systems

• $250K — $300K *

Mclean, VA 22101In-Person

Enterprise Technology

11 - 15 years of experience

4 weeks ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

12-15+ years in machine learning, AI systems, or applied AI research at a high technical level.
Strong research and publication track record with contributions to frontier AI research.
Experience with large-scale experimentation and iterative model/system analysis.
Deep expertise in agentic systems, LLMs, multi-agent systems, and AI infrastructure reliability.
Hands-on experience with agent-based systems, prompt engineering, and tool integration.
Proven experience operating enterprise-grade AI systems at scale.
Strong programming skills in Python; experience in Java or related languages preferred.

Responsibilities

Define and evolve the long-term AI/ML research strategy for Trase OS.
Lead large-scale experimentation and prototyping to translate AI research into scalable systems.
Drive breakthroughs in agentic systems and application infrastructure.
Design operational frameworks for models in long-lived execution environments.
Establish evaluation methodologies for autonomous systems' reliability.
Drive architecture decisions related to orchestration and infrastructure governance.
Collaborate with engineering and product teams to operationalize research.

Benefits

Rapid advancement opportunities with strong performance as Trase grows.
Comprehensive healthcare fully covered for employees and their families.
14 weeks of paid maternity and paternity leave at normal pay.
Unlimited PTO with management approval.
Professional development and continued learning opportunities.
Optional 401K, FSA, and equity incentives available.
Mental health benefits through Tara Mind.

Full Job Description

About the Role

As a Principal AI Researcher, you will define and drive the long-term research direction for the Trase operating system, the agentic execution platform powering autonomous systems in regulated environments. This role sits at the intersection of frontier AI research, agentic systems, orchestration infrastructure, and production deployment, with a focus on how models behave inside real-world execution environments rather than solely on offline benchmark performance.

You will lead research across areas such as agent workflows, tool use, long-lived execution, orchestration, and autonomous system reliability, while conducting large-scale experimentation and advancing novel approaches in applied AI systems.

This is a hands-on technical leadership role operating across research, systems, and product. You will drive technical breakthroughs in agentic infrastructure and applied AI systems, own the end-to-end research-to-production lifecycle, and work closely with engineering and product teams to translate frontier research into scalable, production-grade systems deployed across Trase.
Why This Role Exists

Trase OS coordinates long-lived agents, tool-augmented LLMs, multi-agent workflows, and execution in regulated enterprise environments. As these systems scale, the core challenge shifts from raw model capability to system correctness, orchestration reliability, infrastructure governance, and safe autonomous execution.

We are particularly interested in candidates with expertise or research interest in areas such as:

agent-to-agent learning,
orchestration and harness engineering,
infrastructure governance for AI operating systems,
long-lived execution and memory systems,
SLMs (small language models), model optimization, and fine-tuning recipes,
post-training adaptation techniques and model behavior shaping,
and evaluation frameworks for autonomous agents.

This role will help define how next-generation AI systems are researched, evaluated, and safely operated in production.
Responsibilities

Define and evolve the long-term AI/ML research strategy and technical roadmap for Trase OS in alignment with product and platform direction.
Lead large-scale experimentation and prototyping efforts requiring significant compute infrastructure, translating frontier AI research into scalable, production-grade systems with measurable impact.
Drive original research and technical breakthroughs in agentic systems, autonomous execution, multi-agent orchestration, post-training and fine-tuning systems, SLM/LLM-based architectures, and applied AI infrastructure.
Design how models operate within long-lived execution environments, including agent workflows, tool use, planning, memory systems, reasoning, and human-in-the-loop controls.
Establish evaluation methodologies and reliability frameworks for autonomous systems, including benchmarking, regression testing, safety, controllability, and production behavior analysis.
Drive architecture decisions across orchestration, model serving, routing, inference, and infrastructure governance, including latency, reliability, and cost optimization.
Partner closely with engineering and product teams to operationalize research outcomes into deployable systems and enterprise workflows.
Build AI systems that operate reliably in regulated and constrained environments, including secure cloud, on-premise, and air-gapped deployments.
Contribute to the broader AI research community through technical papers, publications, conference participation, architecture proposals, and thought leadership.
Serve as a senior technical authority and mentor across the organization, influencing technical direction, research rigor, experimentation practices, and best practices across research, engineering, and product teams.

Requirements

12-15+ years of experience in machine learning, AI systems, or applied AI research, including experience operating at a Principal, Distinguished, or equivalent technical level.
Strong research and publication track record, including authored papers, major technical contributions, or active participation in frontier AI research.
Experience publishing at top-tier conferences or contributing influential open-source, research, or AI infrastructure systems.
Experience conducting large-scale experimentation requiring significant compute infrastructure, evaluation workflows, and iterative model/system analysis.
Deep expertise in one or more areas including agentic systems, LLMs and generative AI, multi-agent systems, reasoning systems, reinforcement learning, orchestration infrastructure, AI systems reliability, NLP, multimodal systems, or deep learning.
Hands-on experience with agent-based systems, prompt engineering, RAG, RLHF, SLMs, fine-tuning/post-training techniques, tool integration, memory systems, and human-in-the-loop orchestration.
Proven experience building, deploying, and operating enterprise-grade AI systems, including GenAI, LLM, or agent-based applications at scale.
Strong understanding of ML system behavior in production, including reliability, latency, cost tradeoffs, observability, evaluation frameworks, regression testing, and failure modes.
Strong systems thinking and demonstrated ability to partner cross-functionally with engineering and product organizations to move research into production systems.
Strong programming and prototyping skills in Python and modern ML infrastructure stacks, with experience in Java or related systems languages preferred.
Experience deploying AI/ML systems in regulated, constrained, or enterprise environments, and demonstrated ability to lead technical direction from research through production impact.

Preferred Qualifications

PhD in Computer Science, Machine Learning, AI, Systems, or a related field.
Experience building and operating AI/ML platforms supporting the full model lifecycle, including training, evaluation, deployment, and monitoring.
Experience optimizing ML inference or orchestration systems in real-time, distributed, or resource-constrained environments.

Trase Specific Benefits:

Career track opportunity with potential for rapid advancement with strong performance as the firm grows
100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
Paid maternity and paternity for 14 weeks at employees' normal pay.
Unlimited PTO, with management approval.
Opportunities for professional development and continued learning.
Optional 401K, FSA, and equity incentives available.
Mental health benefits are available through Tara Mind.

Salary Range: $250,000-300,000. This represents the typical salary range for this position based on experience, skills, and other factors.

#LI-RCP

Our Trase Benefits:

For full-time roles only

Career track opportunity with potential for rapid advancement with strong performance as the firm grows
100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
Paid maternity and paternity for 14 weeks at employees' normal pay.
Unlimited PTO, with management approval.
Opportunities for professional development and continued learning.
Optional 401K, FSA, and equity incentives available.
Mental health benefits are available through Tara Mind.

* Ladders Estimates

Similar Jobs

Principal Applied AI Engineer, Finance
$193K — $340K *
Genesys
Washington, DC 20011 (District Of Columbia County)
4 days ago
Sr Director/Scientific Fellow, AI Safety, R&D Data Science and Digital Health
$196K — $342K *
Johnson & Johnson
Spring House, PA 19477 (Montgomery County)
Reposted 5 days ago
Sr Director/Scientific Fellow, AI Safety, R&D Data Science and Digital Health
$196K — $342K *
Johnson & Johnson
Titusville, NJ 08560 (Mercer County)
Reposted 5 days ago
Staff / Principal Data Engineer
$180K — $270K *
Appgate
New York, NY 10025 (New York County)
5 days ago
Principal Applied Scientist, AWS Agentic AI
$218K — $295K *
Amazon
New York, NY 10025 (New York County)
1 week ago
Principal Engineer
$240K — $330K *
Motional
Remote
2 weeks ago

Get Ready For Your
Next Interview

More Jobs at Trase Systems

Staff Software Engineer (Platform Architecture & Execution Model)
$180K — $245K *
Remote
3 days ago
Information Technology
Remote in United States
Staff Software Engineer (Platform Architecture & Execution Model)
$180K — $245K *
Seattle, WA 98115 (King County)
3 days ago
Information Technology
In-Person
Staff Software Engineer (Platform Architecture & Execution Model)
$180K — $245K *
Mclean, VA 22101 (Fairfax County)
3 days ago
Information Technology
In-Person
Senior Healthcare Sales Executive
$175K — $225K *
Remote
1 week ago
Healthcare
Remote in United States
Staff DevSecOps Engineer
$170K — $245K *
Mclean, VA 22101 (Fairfax County)
3 weeks ago
Information Technology
In-Person

More Enterprise Technology Jobs

CMMS Sales Specialist
$75K — $130K *
Kion Group AG
Grand Rapids, MI 49504 (Kent County)
Today
Senior Software Lead
$82K — $166K *
Kion Group AG
Atlanta, GA 30349 (Fulton County)
Today
Senior Director, Enterprise Data & Analytics
$141K — $248K *
Graco
Minneapolis, MN 55407 (Hennepin County)
Today
Project & Program Management III
$80K — $134K *
Astreya Partners
Dublin, OH 43017 (Franklin County)
Today
Forward Deployed Product Manager
$90K — $130K *
Global Payments
Remote
Today

Find similar Principal AI Researcher (Agentic Systems & AI Infrastructure) jobs:

Nationwide Mclean, VA

Principal AI Researcher (Agentic Systems & AI Infrastructure)

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Principal AI Researcher (Agentic Systems & AI Infrastructure) jobs:

Get Ready For Your
Next Interview