Manager, Platform & Site Reliability

Canadian Internet Registration Authority

• $100K — $130K *

Ottawa, ON K1G 3J6In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

7+ years in Site Reliability Engineering, platform engineering, DevOps or cloud operations, with public cloud expertise, preferably AWS.
3+ years of leadership experience in managing technical teams within SRE or platform engineering.
Proven success in mentoring and building high-performing engineering teams, fostering continuous learning and accountability.
Skilled in defining technical strategies relating to reliability, security, and operational excellence.
Strong understanding of public cloud operations including architecture and resilience strategies.
Experience with DevOps practices such as infrastructure as code, GitOps, and CI/CD principles.
Proficient in containerization technologies, incident management, and observability frameworks.

Responsibilities

Lead and develop a team of Site Reliability Engineers and Platform Specialists to enhance reliability and operational excellence.
Define and execute platform strategies aligned with organizational goals and customer needs.
Establish and mature SRE practices, including SLOs and operational acceptance criteria.
Drive continuous improvement of scalable cloud-native platforms using AWS or similar.
Champion automation practices like infrastructure as code to minimize operational toil.
Enhance monitoring and observability to ensure platform reliability and customer satisfaction.
Manage high-severity incidents, ensuring effective response and follow-up actions to improve platform resilience.

Benefits

Blended remote and in-office work arrangements to foster team connection.
Regular events and social activities to encourage community engagement.
Focus on a people-centered recruitment process that values human judgment over AI in hiring.

Full Job Description

By working with the CIRA registry team, you'll play a part in advancing the CIRA Registry Platform, which supports a wide range of domains globally. Help us drive innovation and maintain the high standards of stability and security that our platform is known for. Join us in advancing digital identity and technology in Canada and beyond.

Who You Are:

You are a people-first technology leader who thrives at the intersection of reliability, platform engineering, and operational excellence. You enjoy building high-performing teams, creating clarity in complex environments, and empowering engineers to do their best work. You balance strategic thinking with technical depth, helping teams deliver resilient, scalable services while continuously improving processes, tooling, and ways of working. Most importantly, you're motivated by solving meaningful challenges and contributing to infrastructure that Canadians and organizations around the world rely on every day.

What You'll Do:

Lead, coach, and develop a high-performing team of SRE and Platform Specialists responsible for the reliability, scalability, security, and operational excellence of CIRA's registry platforms and supporting technology services.
Define and execute the platform and site reliability strategy, aligning priorities and investments with organizational objectives and customer needs.
Define and mature SRE practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), error budgets, production readiness standards, and operational acceptance criteria for mission-critical registry services.
Drive the design, operation, and continuous improvement of scalable, resilient, cloud-native platforms using public cloud technologies such as AWS.
Champion automation, infrastructure as code, GitOps, CI/CD, and self-service platform capabilities to reduce manual effort, operational toil, and engineering bottlenecks.
Establish and continuously improve observability, monitoring, alerting, and dashboarding practices to provide clear visibility into platform health, service reliability, and customer-impacting issues.
Lead incident management for high-severity events, providing incident command, stakeholder communication, root cause analysis, and driving follow-up actions that strengthen long-term platform resilience.
Collaborate with engineering, security, support, compliance, and business stakeholders to establish priorities, balance risk, and deliver platform improvements that support registry operations and organizational goals.
Drive performance engineering, capacity planning, disaster recovery testing, and resilience validation to ensure the ongoing reliability and availability of critical registry platforms and related services.
Foster a culture of ownership, accountability, continuous learning, operational excellence, and psychological safety that empowers the team to innovate and perform at their best.

What You Bring:

7+ years of progressive experience in Site Reliability Engineering (SRE), platform engineering, DevOps, infrastructure, or cloud operations, including hands-on experience with public cloud platforms such as AWS.
3+ years of experience leading, coaching, and developing technical teams in SRE, platform engineering, DevOps, infrastructure, or cloud operations.
Demonstrated success building and developing high-performing engineering teams through mentoring, coaching, performance management, and fostering a culture of continuous learning and accountability.
Experience defining technical strategy, influencing cross-functional stakeholders, and balancing reliability, security, operational excellence, and business priorities.
Strong hands-on background with public cloud platforms, preferably AWS, including cloud-native architecture, networking, security, resilience, scalability, and cost-aware operations.
Experience leading teams that implement and operate infrastructure as code (IaC), GitOps, and automation practices to manage cloud infrastructure, platform services, and deployment workflows.
Strong understanding of CI/CD principles, release automation, and modern software delivery practices.
Experience with containerization and orchestration technologies such as Docker and Kubernetes.
Experience with observability platforms, monitoring frameworks, incident management practices, and operational analytics tools.
Demonstrated experience defining and implementing SLOs, SLIs, error budgets, production readiness standards, and incident response processes.
Strong understanding of disaster recovery, business continuity, backup and recovery strategies, and resilience testing.
Experience supporting highly available, mission-critical, or regulated technology platforms where reliability, security, and operational discipline are essential.
Exceptional communication, collaboration, and stakeholder management skills, with the ability to translate complex technical concepts into clear business outcomes for both technical and non-technical audiences.

CIRA embraces a blend of remote and IRL in-office work to keep our team connected and engaged. Our Ottawa headquarters is a hub for regular events and social activities that bring our team together, encouraging a strong sense of community within our organization. No matter where you work from, you'll always feel part of our vibrant team and our shared mission.

At CIRA, people remain at the centre of our recruitment process. While CIRA uses recruitment platforms that include artificial intelligence-enabled features, which may be used to support administrative processes or skills-based assessments, these features are intended to assist our recruitment activities and do not replace human judgment. All applicant screenings, interviews, evaluations and selection decisions are conducted by our staff. Artificial intelligence is not used to make autonomous or final hiring decisions.

This posting is for an existing vacancy.

* Ladders Estimates

Similar Jobs

Sr Manager of Digital Platforms Technology - Operations
$114K — $164K *
Co-operators
Guelph, ON N1C 1A1
3 days ago
Sr Manager of Digital Platforms Technology - Operations
$114K — $164K *
Co-operators
Toronto, ON M3C 0E3
3 days ago
Senior Manager, Engineering Operations
$120K — $150K *
Scotiabank
Toronto, ON M3C 0E3
2 weeks ago
Production Operations Manager
$100K — $130K *
Interra Health
Remote
1 month ago

Get Ready For Your
Next Interview

More Jobs at Canadian Internet Registration Authority

Manager, Technical Solutions
$90K — $120K *
Ottawa, ON K1G 3J6
Reposted 6 days ago
Information Technology
In-Person
Senior Product Manager (Network Infrastructure)
$100K — $130K *
Ottawa, ON K1G 3J6
Reposted 6 days ago
Telecommunications & Hardware
In-Person
Infrastructure Administrator
$75K — $95K *
Ottawa, ON K1G 3J6
Reposted 6 days ago
Information Technology
In-Person
Senior Product Manager (Cybersecurity)
$100K — $130K *
Ottawa, ON K1G 3J6
Reposted 6 days ago
Information Technology
In-Person
Finance Business Partner
$75K — $95K *
Ottawa, ON K1G 3J6
Reposted 6 days ago
Finance & Insurance
In-Person

More Information Technology Jobs

Software Developer
$90K — $120K *
Seequent
Toronto, ON M3C 0E3
Today
Modern Endpoint Management Engineer
$90K — $120K *
Scotiabank
Toronto, ON M3C 0E3
Today
Test Engineer
$60K — $135K *
Wipro
Mount Laurel, NJ 08054 (Burlington County)
Reposted Today
Manager, IT M&A - Due Diligence, Integration, and Separation
$146K — $267K *
KPMG
Atlanta, GA 30349 (Fulton County)
Reposted Today
Software Developer
$163K — $164K *
Siemens
Buffalo Grove, IL 60089 (Lake County)
Reposted Today

Find similar Manager, Platform & Site Reliability jobs:

Nationwide Ottawa, ON

Manager, Platform & Site Reliability

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Manager, Platform & Site Reliability jobs:

Get Ready For Your
Next Interview