Principal AI Researcher (Agentic Systems & AI Infrastructure)

Trase Systems

$250K — $300K *
Enterprise Technology
11 - 15 years of experience
Job Overview by Ladders

Qualifications

  • 12-15+ years in machine learning, AI systems, or applied AI research at a high technical level.
  • Strong research and publication track record with contributions to frontier AI research.
  • Experience with large-scale experimentation and iterative model/system analysis.
  • Deep expertise in agentic systems, LLMs, multi-agent systems, and AI infrastructure reliability.
  • Hands-on experience with agent-based systems, prompt engineering, and tool integration.
  • Proven experience operating enterprise-grade AI systems at scale.
  • Strong programming skills in Python; experience in Java or related languages preferred.

Responsibilities

  • Define and evolve the long-term AI/ML research strategy for Trase OS.
  • Lead large-scale experimentation and prototyping to translate AI research into scalable systems.
  • Drive breakthroughs in agentic systems and application infrastructure.
  • Design operational frameworks for models in long-lived execution environments.
  • Establish evaluation methodologies for autonomous systems' reliability.
  • Drive architecture decisions related to orchestration and infrastructure governance.
  • Collaborate with engineering and product teams to operationalize research.

Benefits

  • Rapid advancement opportunities with strong performance as Trase grows.
  • Comprehensive healthcare fully covered for employees and their families.
  • 14 weeks of paid maternity and paternity leave at normal pay.
  • Unlimited PTO with management approval.
  • Professional development and continued learning opportunities.
  • Optional 401K, FSA, and equity incentives available.
  • Mental health benefits through Tara Mind.
Full Job Description
About the Role

As a Principal AI Researcher, you will define and drive the long-term research direction for the Trase operating system, the agentic execution platform powering autonomous systems in regulated environments. This role sits at the intersection of frontier AI research, agentic systems, orchestration infrastructure, and production deployment, with a focus on how models behave inside real-world execution environments rather than solely on offline benchmark performance.

You will lead research across areas such as agent workflows, tool use, long-lived execution, orchestration, and autonomous system reliability, while conducting large-scale experimentation and advancing novel approaches in applied AI systems.

This is a hands-on technical leadership role operating across research, systems, and product. You will drive technical breakthroughs in agentic infrastructure and applied AI systems, own the end-to-end research-to-production lifecycle, and work closely with engineering and product teams to translate frontier research into scalable, production-grade systems deployed across Trase.
Why This Role Exists

Trase OS coordinates long-lived agents, tool-augmented LLMs, multi-agent workflows, and execution in regulated enterprise environments. As these systems scale, the core challenge shifts from raw model capability to system correctness, orchestration reliability, infrastructure governance, and safe autonomous execution.

We are particularly interested in candidates with expertise or research interest in areas such as:
  • agent-to-agent learning,
  • orchestration and harness engineering,
  • infrastructure governance for AI operating systems,
  • long-lived execution and memory systems,
  • SLMs (small language models), model optimization, and fine-tuning recipes,
  • post-training adaptation techniques and model behavior shaping,
  • and evaluation frameworks for autonomous agents.

This role will help define how next-generation AI systems are researched, evaluated, and safely operated in production.
Responsibilities
  • Define and evolve the long-term AI/ML research strategy and technical roadmap for Trase OS in alignment with product and platform direction.
  • Lead large-scale experimentation and prototyping efforts requiring significant compute infrastructure, translating frontier AI research into scalable, production-grade systems with measurable impact.
  • Drive original research and technical breakthroughs in agentic systems, autonomous execution, multi-agent orchestration, post-training and fine-tuning systems, SLM/LLM-based architectures, and applied AI infrastructure.
  • Design how models operate within long-lived execution environments, including agent workflows, tool use, planning, memory systems, reasoning, and human-in-the-loop controls.
  • Establish evaluation methodologies and reliability frameworks for autonomous systems, including benchmarking, regression testing, safety, controllability, and production behavior analysis.
  • Drive architecture decisions across orchestration, model serving, routing, inference, and infrastructure governance, including latency, reliability, and cost optimization.
  • Partner closely with engineering and product teams to operationalize research outcomes into deployable systems and enterprise workflows.
  • Build AI systems that operate reliably in regulated and constrained environments, including secure cloud, on-premise, and air-gapped deployments.
  • Contribute to the broader AI research community through technical papers, publications, conference participation, architecture proposals, and thought leadership.
  • Serve as a senior technical authority and mentor across the organization, influencing technical direction, research rigor, experimentation practices, and best practices across research, engineering, and product teams.
Requirements
  • 12-15+ years of experience in machine learning, AI systems, or applied AI research, including experience operating at a Principal, Distinguished, or equivalent technical level.
  • Strong research and publication track record, including authored papers, major technical contributions, or active participation in frontier AI research.
  • Experience publishing at top-tier conferences or contributing influential open-source, research, or AI infrastructure systems.
  • Experience conducting large-scale experimentation requiring significant compute infrastructure, evaluation workflows, and iterative model/system analysis.
  • Deep expertise in one or more areas including agentic systems, LLMs and generative AI, multi-agent systems, reasoning systems, reinforcement learning, orchestration infrastructure, AI systems reliability, NLP, multimodal systems, or deep learning.
  • Hands-on experience with agent-based systems, prompt engineering, RAG, RLHF, SLMs, fine-tuning/post-training techniques, tool integration, memory systems, and human-in-the-loop orchestration.
  • Proven experience building, deploying, and operating enterprise-grade AI systems, including GenAI, LLM, or agent-based applications at scale.
  • Strong understanding of ML system behavior in production, including reliability, latency, cost tradeoffs, observability, evaluation frameworks, regression testing, and failure modes.
  • Strong systems thinking and demonstrated ability to partner cross-functionally with engineering and product organizations to move research into production systems.
  • Strong programming and prototyping skills in Python and modern ML infrastructure stacks, with experience in Java or related systems languages preferred.
  • Experience deploying AI/ML systems in regulated, constrained, or enterprise environments, and demonstrated ability to lead technical direction from research through production impact.
Preferred Qualifications
  • PhD in Computer Science, Machine Learning, AI, Systems, or a related field.
  • Experience building and operating AI/ML platforms supporting the full model lifecycle, including training, evaluation, deployment, and monitoring.
  • Experience optimizing ML inference or orchestration systems in real-time, distributed, or resource-constrained environments.


Trase Specific Benefits:
  • Career track opportunity with potential for rapid advancement with strong performance as the firm grows
  • 100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
  • Paid maternity and paternity for 14 weeks at employees' normal pay.
  • Unlimited PTO, with management approval.
  • Opportunities for professional development and continued learning.
  • Optional 401K, FSA, and equity incentives available.
  • Mental health benefits are available through Tara Mind.

Salary Range: $250,000-300,000. This represents the typical salary range for this position based on experience, skills, and other factors.

#LI-RCP

Our Trase Benefits:

For full-time roles only
  • Career track opportunity with potential for rapid advancement with strong performance as the firm grows
  • 100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
  • Paid maternity and paternity for 14 weeks at employees' normal pay.
  • Unlimited PTO, with management approval.
  • Opportunities for professional development and continued learning.
  • Optional 401K, FSA, and equity incentives available.
  • Mental health benefits are available through Tara Mind.

Similar Jobs

More Jobs at Trase Systems

More Enterprise Technology Jobs

Find similar Principal AI Researcher (Agentic Systems & AI Infrastructure) jobs: