About the RoleA well-funded, early-stage B2B SaaS company building AI agent infrastructure for mechanical engineering workflows is hiring a
Staff Engineer - Agentic AI to own the core agent intelligence layer. This is a high-impact, senior technical leadership role reporting directly to the CTO. You'll sit at the intersection of applied agentic AI, user research, and product delivery - determining real-world value for Fortune 100 enterprise customers in the CAD, CAE, and PLM space.
You'll lead a small team of AI engineers, a user researcher, and domain expert contractors, acting as a player-coach who writes production code and sets technical direction.
What You'll Do- Lead development of the core agent intelligence layer that executes multi-step workflows across complex desktop engineering software.
- Own the full product loop: define agent capabilities from user stories, build implementations, and benchmark against real workflows.
- Drive agent task success rate - define the evaluation framework, establish baselines, and systematically improve completion metrics.
- Set and enforce per-task token budgets; track cost per completed workflow to ensure commercial viability.
- Build rigorous, reproducible evaluation infrastructure grounded in validated user stories (SWE-bench-level rigor applied to engineering workflows).
- Lead user story mapping and validation through interviews and close collaboration with domain experts.
- Translate validated user stories into testable evals and close the loop between research and benchmarking.
- Own agent architecture decisions: tool-calling strategies, state management, error recovery, model routing, and context management.
- Set technical direction, review architecture decisions, unblock the team, and raise the engineering bar across a team of 3-6 engineers.
- Collaborate cross-functionally with integrations, product, and customers during POCs to align agent behavior with real-world usage.
What We're Looking ForMust-haves:- 7+ years in software engineering, including at least 2 years building agentic LLM-based systems that act in the real world (multi-step workflows, tool-calling, failure handling, cost constraints).
- Deep experience with LLM application architecture: model selection, context/window management, retrieval strategies, tool-calling frameworks, and orchestration patterns.
- Strong evaluation and benchmarking instincts for agentic systems - task completion, cost efficiency, and failure mode analysis; familiarity with SWE-bench, GAIA, or -bench.
- Proven track record of shipping AI systems with measurable outcomes, not just demos.
- Proficiency in Python and the LLM tooling ecosystem (function calling, tool use APIs, tracing/observability tools such as Logfire or LangSmith, evaluation frameworks).
- Experience leading a small technical team (3-6 engineers): setting direction, performing code reviews, and driving architecture decisions.
Nice-to-haves:- Experience with desktop automation, COM, or programmatic control of applications (beyond web APIs).
- Background in mechanical engineering, CAD/CAE, PLM, or adjacent industries.
- Familiarity with enterprise deployment constraints on locked-down corporate workstations.
- Published work or open-source contributions in agentic AI systems.
- Experience building or contributing to public benchmarks for AI agents.
Note: Visa sponsorship is not available for this role.
Compensation & Benefits- Salary: $160,000 - $250,000 USD annually
- Early-stage equity
- Direct line to executive leadership and outsized scope of impact
LocationThis is an
on-site role based in
San Francisco, CA. Candidates must be willing to work from the office.