The AI Software Development team is charged with transforming how our engineering organization designs, builds, and ships software. The team drives the adoption of AI-assisted tooling across the full software development lifecycle, deploys autonomous agents that expand engineering capacity, and ensures that every new generation of AI capability is evaluated, integrated, and operationalized with rigor.
s the AI Agents Lead, you are responsible for identifying, evaluating, building, and operating autonomous agents that run alongside our engineering organization - executing discrete software tasks end to end. This role spans both the discovery of existing agent solutions available in the market and the engineering of custom agents where gaps exist. You are a practitioner who moves from prototype to production, and a researcher who tracks the frontier to ensure we are adopting the best available capabilities before we build our own.
KEY RESPONSIBILITIES - Continuously survey the AI agents market - evaluating commercially available agents and frameworks for suitability across software engineering tasks including bug remediation, dependency upgrades, test generation, documentation synthesis, and code refactoring.
- Make build-vs-buy decisions for agent capabilities: source and integrate existing solutions where they meet quality and security standards; engineer custom agents where the market falls short.
- Design, build, and operate bespoke autonomous agents for high-value tasks not adequately served by available tooling.
- Own the agent capability roadmap - mapping current model capability to production-ready task horizons and sequencing deployment based on complexity and risk.
- rchitect agent scaffolding: orchestration layers, tool use frameworks, context management, retrieval strategies, and human-in-the-loop review gates.
- Define agent evaluation frameworks - measuring task completion rates, rework frequency, review rejection rates, and economic contribution against human baselines.
- Partner with DevOps to establish safe execution environments and rollback capabilities for agent workloads.
- Partner with InfoSec to enforce access control, secret handling, and output review policies for agent-generated artifacts.
- Track the frontier - monitoring model releases, autonomous capability research, and emerging agent frameworks to continuously advance the deployment roadmap.
- Support agent platform capability planning and cost modeling as autonomous agent deployment scales across the engineering organization.
REQUIRED QUALIFICATIONS - 7+ years of software engineering experience with at least 2 years building or operating AI agents or LLM-powered automation systems in production.
- Deep expertise in agent frameworks - LangGraph, AutoGen, CrewAI, or equivalents - and LLM orchestration patterns.
- Demonstrated experience sourcing and evaluating commercially available AI agents and automation tools against structured criteria.
- Direct experience with prompt engineering and LLM-based developer tools at a production engineering scale.
- Familiarity with AI autonomous task benchmarks - including SWE-bench Verified, METR autonomous task horizon research, RE-Bench, and similar evaluation frameworks.
- Strong software engineering fundamentals - capable of shipping production-quality agent infrastructure, not just prototypes.
- Rigorous approach to evaluation: experience designing evals, benchmarks, and A/B frameworks for agent performance measurement.
- bility to reason clearly about autonomy risk, failure modes, and appropriate human oversight checkpoints.
NICE TO HAVE - Prior vendor assessment experience at an enterprise organization: RFPs, capability scorecards, or proof-of-concept evaluation programs for AI tooling.
- Experience operating agents in regulated or security-sensitive enterprise environments.
- Contributions to open-source agent frameworks or LLM tooling ecosystems.
- Prior experience translating agent ROI into executive-level investment narratives.
- Understanding of model capability progression curves and how to sequence agent scope as model generations advance.