Staff Site Reliability Engineer

ScalePad

• $100K — $130K *

Vancouver, BC V5K 5J9In-Person

Information Technology

8 - 10 years of experience

2 weeks ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

8+ years in software engineering or related fields, with 5 years in SRE, DevOps, or Platform Engineering
Strong background in cloud infrastructure and distributed systems
Proficient in infrastructure as code, CI/CD, and operational best practices
Experience with SLO/SLI frameworks and reliability engineering
Knowledge in incident management and troubleshooting in production environments
Ability to lead cross-team technical initiatives and influence stakeholders
Passionate about mentoring engineers and improving company culture
Experience integrating AI-assisted tools into operational workflows

Responsibilities

Own production infrastructure across AWS and Azure, focusing on networking and cost management
Build and operate Terraform modules for clean, reviewable infrastructure as code
Run and improve Kubernetes in production environments
Operate and enhance CI/CD pipelines for the engineering organization
Operationalize SLO/SLI frameworks alongside the SRE team
Own incident response practices, including on-call tooling and incident reviews
Mentor engineers and lead post-mortem initiatives for reliability practices

Benefits

Employee Stock Ownership Plan (ESOP) and RRSP matching
Parental leave programs for family support
Mentorship opportunities with industry leaders and founders
Annual professional development budget for skill enhancement
Work with top-of-the-line hardware and equipment
Flexible remote or hybrid work structure
Monthly stipend for hybrid work environment support
100% employer-paid health benefits

Full Job Description

About the role

We're looking for a Staff Site Reliability Engineer (SRE) to be the senior technical anchor across our multi-cloud platform and developer experience. This is a hands-on senior individual contributor role for an engineer who wants to own real systems, unblock teams day to day, and raise the bar on how engineering ships and operates at ScalePad.

You'll work directly with engineering leadership and alongside SREs across product domains. Reliability, infrastructure as code, internal tooling, and developer productivity all sit inside your scope. You'll spend your time building, operating, and improving the systems the rest of engineering depends on.

What you'll do

Get ready to go beyond order-taking. Your strategic responsibilities include:

Platform and Infrastructure

Own production infrastructure across AWS and Azure, including networking, IAM, and cost
Build and operate Terraform modules and state at scale, keeping our infrastructure as code clean and reviewable
Run Kubernetes in production: upgrades, scaling, troubleshooting, and platform improvements
Operate and improve CI/CD pipelines that the entire engineering org depends on

Reliability & Operational Excellence

Operationalize SLO/SLI frameworks and observability practices alongside the SRE team
Own incident response practice, on-call tooling, and incident review follow-through
Reduce operational toil through automation across secret rotation, access management, and environment provisioning
Execute on capacity planning, disaster recovery, and resilience work across critical systems

Developer Experience & Technical Influence

Build and maintain internal developer tooling that removes friction across engineering
Lead rollouts of AI-native tooling for code review, testing, and engineering productivity, e.g., CodeRabbit, Copilot-class assistants, and internal AI workflows
Own migrations and consolidation of internal platforms such as Jira, Confluence, ticketing, and documentation systems
Partner with engineering and product leadership to identify and remove the biggest DX bottlenecks, and align infrastructure and reliability investments with business goals
Mentor engineers and technical leads, fostering growth and knowledge-sharing within the organization
Lead post-mortems and continuous improvement initiatives to strengthen reliability practices

Innovation & Continuous Improvement

Evaluate and introduce new technologies, tools, and approaches to improve scalability and efficiency
Drive standardization and modernization efforts across infrastructure and operational practices
Lead proof-of-concept and experimentation initiatives to validate new reliability solutions

What we're looking for

We care about what you can do more than where you've done it. However, experience in the following areas will help you hit the ground running in this role:

Must-haves

8+ years of experience in software engineering, infrastructure, or related technical disciplines, with at least 5 years focused on Site Reliability Engineering (SRE), DevOps, Platform Engineering, or similar roles.
Strong expertise in cloud infrastructure, distributed systems, networking, and observability practices
Experience designing and operating highly available, scalable production systems
Deep understanding of scripting, automation, infrastructure as code, CI/CD, and operational best practices
Experience implementing SLO/SLI frameworks and reliability engineering methodologies
Incident management, troubleshooting, and on-call experience in complex production environments
Proven ability to lead large-scale technical initiatives across multiple teams
Track record of cross-team technical influence without formal authority, excellent communication and collaboration skills with both technical and non-technical stakeholders
Passion for mentoring engineers and improving engineering culture
Demonstrated ability to thoughtfully integrate AI-assisted tooling into engineering and operational workflows to improve efficiency, reliability, and developer experience

Nice to Have

Experience rolling out AI tooling in an engineering organization
Experience leading tooling and platform migrations such as Jira, Confluence, or observability stacks
Experience with chaos engineering practices and reliability testing
Experience optimizing large-scale cloud infrastructure costs

Perks

ScalePad offers our employees a blend of purpose, growth, and genuinely great perks.

Everyone's an owner. Share in our success through our Employee Stock Ownership Plan (ESOP) and RRSP matching.
Support for growing families. Parental leave programs are in place to support you and your family when it matters most.
Structured mentorship with builders. Join opt-in mentorship programs and learn directly from founders and senior leaders who've scaled multiple SaaS ventures and spent decades in the MSP industry.
Invest in your growth every year. Access an annual professional development budget to level up your skills, your career, and your impact.
Set yourself up with great tools. Work with brand new, top-of-the-line hardware and equipment so you can do your best work, whether you're at home or in one of our hubs.
Modern ways of working. Roles at ScalePad are structured as remote or hybrid, with hub locations in Vancouver, Toronto, Montreal, and Phoenix. Specific work models are outlined in each posting.
Support for hybrid life. Receive a monthly stipend to help you create an effective hybrid or remote work environment.
Well-being and time to recharge. Take care of yourself with 100% employer-paid benefits.

Before You Apply

This is a full-time role for those who are eligible to work in Canada. We thank all applicants for taking the time to apply, but only candidates who make it to the next stage will be contacted.

Note on AI Use: ScalePad uses AI technology to support certain administrative aspects of our hiring process, such as transcription, note-taking, and interview documentation. These tools are strictly used to assist our team and have no influence on candidate evaluation or hiring decisions.

No recruiters, please.

* Ladders Estimates

Similar Jobs

Senior Systems Analyst, Integrated Electronic Health Record Solutions
$111K *
PHSA
Burnaby, BC V3J 1A1
Today
Senior Device Analyst, Devices, Regional Digital Solutions
$111K *
PHSA
Vancouver, BC V5K 5J9
Yesterday
C5I Principal Platform Engineer - Remote
$107K — $204K *
Raytheon Technologies
Remote
Yesterday
Utilities Transmission & Distribution Control Room & Real-Time Systems Consultant or Manager
$80K — $189K *
Accenture
Redmond, WA 98052 (King County)
Yesterday
Utilities Transmission & Distribution Control Room & Real-Time Systems Consultant or Manager
$80K — $189K *
Accenture
Seattle, WA 98115 (King County)
Yesterday
Utilities Transmission & Distribution Control Room & Real-Time Systems Consultant or Manager
$80K — $189K *
Accenture
Kirkland, WA 98034 (King County)
Yesterday

Get Ready For Your
Next Interview

More Jobs at ScalePad

Assistant Controller
$90K — $120K *
Toronto, ON M3C 0E3
Yesterday
Finance & Insurance
In-Person
Staff Site Reliability Engineer
$100K — $130K *
Vancouver, BC V5K 5J9
2 weeks ago
Information Technology
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
3 days ago
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
1 week ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Director It
$140K — $160K *
Mohegan
Niagara Falls, ON L2E 0A1
Today
Developer III
$100K — $130K *
Buildium
Richardson, TX 75080 (Dallas County)
Today

Find similar Staff Site Reliability Engineer jobs:

Nationwide Vancouver, BC

Staff Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff Site Reliability Engineer jobs:

Get Ready For Your
Next Interview