Job DescriptionWHAT IS THE OPPORTUNITY?We're building the engine that judges how good our agents actually are. Claims
have to be data-driven: you can't build on what you can't see, so how can you
honestly say one version is 10% better than the last? Evaluation runs both before
we ship and after; this role owns the runtime side - judging agents live in
production, from the traces they generate serving real traffic.
The hard part is the data. Agent behaviour generates verbose traces with high
cardinality, and we need a system that can analyze them real-time, providing
actionable insights in low latency. Join us to build it: the engineering looks a
lot like site reliability engineering meeting user analytics, combining
high-throughput low latency data with evaluating user behaviour and outcomes.
WHAT WILL YOU DO?- Build the ingestion path that takes agent traces at production volume and keeps up with it.
- Score agent behaviour live - judge quality straight from the trace as it happens, not in a batch job hours later.
- Enforce quality and safety guardrails in the request path stopping it before it reaches the user, within a fixed latency budget and at predictable cost.
- Correlate spans across services so one request reads as one trace.
- Own the experience of turning production traces back into datasets and test cases the next version is measured against.
- Set the technical direction for this burgeoning field, and push it into the open through open source contributions and conference talks.
WHAT DO YOU NEED TO SUCCEED?Must have- 8+ years in software or platform engineering, with 5+ in SRE, real-time data infrastructure, observability, or large-scale stream processing.
- A track record running high-volume telemetry in production with hands-on work on ingestion, storage, and query at scale.
- Distributed tracing and Open Telemetry: semantic conventions, collector configuration, span correlation across services.
- Familiarity with routing traffic on live signal, whether that's weighted load balancing, canary rollouts, or multi-armed-bandit routing.
- Turning telemetry into decisions in real time - scoring, anomaly detection, or
- rule/threshold evaluation on streaming data.
- An LLM observability platform (`Langfuse`, `MLFlow`, or equivalent) and the trace-to-evaluation feedback loop.
Nice-to-have- A feel for the latency and backpressure trade-offs of doing work in the live request path - collectors, proxies, sidecars.
- Experience in a regulated industry (financial services, healthcare) and its constraints on AI infrastructure.
- AI security controls in the request path: prompt-injection mitigation, output filtering, PII detection.
- AI governance, model audit logging, and runtime drift detection.
- Open-source contributions or published work in observability, tracing, or LLM evaluation.
WHAT'S IN IT FOR YOU?- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
- Leaders who support your development through coaching and managing opportunities
- Ability to make a difference and lasting impact
- Work in a dynamic, collaborative, progressive, and high-performing team
- A world-class training program in financial services
- Opportunities to do challenging work
Job SkillsBig Data Management, Data Mining, Data Science, Deep Learning, Machine Learning (ML), Predictive Analytics, Programming Languages
Additional Job DetailsAddress:RBC WATERPARK PLACE, 88 QUEENS QUAY W:TORONTO
City:Toronto
Country:Canada
Work hours/week:37.5
Employment Type:Full time
Platform:TECHNOLOGY AND OPERATIONS
Job Type:Regular
Pay Type:Salaried
Posted Date:2026-06-22
Application Deadline:2026-07-15
Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above