Job Summary:
JOB DESCRIPTION - LEAD DEVELOPER, CSRE
Location: Toronto, ON (Remote)
Division: Ticketmaster NA
Line Manager: Principal Platform Engineer, CSRE
Contract Terms: Permanent, 37.5 hours per week
Position Details: This job advertisement is for an existing, immediate vacancy.
You will be part of the Platform Enablement team, which partners with engineering teams throughout Ticketmaster to improve how systems are built, observed, deployed, and operated. We often deliver through work that combines hands-on engineering with capability building so teams can sustain improvements independently. The team's remit is to raise the standard of resilience and operational practice across Ticketmaster and help teams get more value out of the development platform they already depend on.
We support teams across the globe, with many peers in North America. Many of your teammates operate in North America and the United Kingdom, and we are adding people in other time zones.
THE JOB
As a Lead Developer in Platform Enablement, you will lead enablement work across multiple teams or a domain, meeting customers where they are and helping them get more out of the platform than they would on their own. You will turn rough problems into sequenced workstreams, sort out dependencies, and leave teams with mechanisms they can run themselves.
You will mentor other engineers on the team, codify reusable patterns, and feed what you learn back into the developer platform so the next team benefits without having to ask.
WHAT YOU WILL BE DOING
- Help engineering teams get more out of the developer platform by finding where they're stuck or underusing something, and working with them to fix problems.
- Coach teams on resilience and operational practice, including SLOs, error budgets, alerting philosophy, and production readiness, in the context of their own services.
- Build self-service workflows, templates, components, and documentation so common problems stop being tickets and start being solved by the platform.
- Improve how teams use observability and logging products by raising signal quality, tightening alerting, and helping them build dashboards that answer the questions they actually ask during incidents.
- Help teams improve their CI/CD and safely shorten the path from commit to production.
- Support teams adopting LLM and AI-assisted workflows in their daily engineering work, sharing patterns that work and building organizational skill as the landscape changes.
- Pair with teams on resilience-focused design and code reviews, guiding them toward simpler, safer architectures.
- Support incident analyses with partner teams, focusing on reducing the impact of contributing factors and implementing durable fixes.
- Mentor engineers through pairing, reviews, and coaching.
- Bring actionable feedback to the CSRE Platform to improve our products and integrations.
- Improve Enablement's own procedures and operating practices based on lessons learned.
WHAT YOU NEED TO KNOW (or TECHNICAL SKILLS)
- Deep practical understanding of SRE principles, including building SLIs and SLOs, and error budgets in practice.
- Proven ability to lead cross-team technical work and influence with situational authority.
- Strong experience designing and troubleshooting distributed systems with cross-service failure modes.
- Experience improving observability and alerting in production, including signal quality and useful dashboards.
- Comfortable working with systems running in on-premises data centers.
- Strong cloud native experience, including governance and cost trade-offs.
- Ability to design resilience and operational automation and tooling that is reusable and adopted by multiple teams.
- Experience with production readiness and resilience practices, including DR validation and controlled testing.
- Strong software engineering fundamentals with the ability to deliver and review high-quality changes in enterprise codebases.
- Strong incident analysis skills focused on contributing factors and impact reduction.
- Experience working with LLM and AI-assisted tooling for engineering work, with judgment about where it helps and where it does not.
- Excellent written and verbal communication, including clear procedures, useful design docs, and exec-ready summaries.
YOU (BEHAVIOURAL SKILLS)
- Lead with service and humility, creating momentum without leaning on authority.
- Default to curiosity, asking how a team got where they are before prescribing a solution.
- Hold a point of view: ready to say "this is how we do it" and explain why, and just as ready to take a better idea back to platform leadership.
- Persuade with evidence and empathy, adapting your message for engineers, product, and senior stakeholders.
- Mentor deliberately, helping others grow as engineers and as people who support other engineers well.
- Give direct feedback with respect, and protect psychological safety while raising the bar.
- Stay patient in a complex organization, keeping work moving when dependencies are slow.
- Turn messy inputs into clear next steps without needing every detail resolved first.
- Prefer simple mechanisms that scale over bespoke fixes that only you can maintain.
- Work sustainably and design systems that do not require heroics to keep running.
- Take pride in the documentation and decisions that let teams sustain the work.
- Remain adaptable, switching between hands-on debugging and engineering, customer support, and planning as needed.
The expected compensation for this position in Ontario is $120,000 to $150,000 CAD annually.
#LI-REMOTE