Sr. Site Reliability Engineer


Toronto, ON

Industry: Financial Services


Not Specified years

Posted 328 days ago

Department Overview

Building a World-Class, Diverse and Inclusive Technology Team at TD

We can’t afford to be boring. Neither can you. The scale and scope of what TD does may surprise you. The rapid pace of change makes it a business imperative for us to be smart and open-minded in the way we think about technology. TD’s technology and business teams become more intertwined as new opportunities present themselves. This new era in banking does not equal boring. Not at TD, anyway.

TD Engineering covers a broad range of exercises and initiatives including requirements gathering, design specification, industry analysis, vendor engagement and analysis, software development, project management, financial management, test plans and execution, and operational standards implementation. Our highly-coveted Engineers are interspersed amongst many areas of focus: innovation, design, execution, maintenance, strategy, portfolio management. We call all of these things incredible learning opportunities and no two assignments are ever the same.

There’s room to grow in all of it.

Job Description

About This Role

The Site Reliability Engineer reports to the Sr. Manager Cloud Level 3 Operations. A candidate for this role will be a competent developer who is very keen on automating everything and will go the extra mile to get to fully automated everything. You will have great empathy for the customer and always be customer focused in their choices. You will believe that more frequent deployments and getting better at introducing change rather than slowing change down brings about a more stable, reliable and secure environment for our customers. Your efforts will be ego-less development as well be self-critique of their work as a constant honing of skill and increase in depth of ability will be needed to keep pace. You believe in DevOps principles and practices being applied throughout the lifecycle of the application and all services used. You define the agile software development phrase “Simplicity—the art of maximizing the amount of work not done—is essential” and you always make your solutions complete, simple, repeatable, stable and secure.

The team you will be on is responsible for handling engineering escalations from TD Cloud Operations. As well as responsible for new Cloud Service Layer code deployments as well as engineering cloud monitoring, backup and monitoring systems themselves such as Splunk. This team will work with other Cloud Engineering teams to integrate these services into TD’s Cloud Services Layer (CSL). Responsibilities include software deployment and upgrades, system setup, monitoring, incident resolution, problem management, bug tracking, and routine update of services.



  • Provide L3 engineering operations support for TD’s private and public cloud services.
  • Define and report Key Performance Indicators to monitor service and platform health
  • Define and report Customer facing service metrics and create both operations and customer facing dashboards.
  • Implement and oversee monitoring and backup guidelines for TD private cloud services.
  • Assist operations in the conduct of system outage analysis to prevent the reoccurrence of incidents.
  • Root cause analysis of existing environmental problems and implementation of long term fixes


  • Constant improvement of all services and processes
  • Other tasks as required


Strong hands-onexperiencein:

  • Incident management and troubleshooting
  • Public Cloud expertise (Azure, AWS)
  • Private Cloud expertise (Openstack, VMWare)
  • Cloud backup strategies and systems (Object storage, CommVault, etc)
  • Cloud Monitoring tools and systems (Sensu, Splunk, etc)
  • Understanding of Infrastructure Automation (SaltStack, Chef, Puppet, Ansible or equivalent)
  • Significant experience with build and releaseengineering tooling (Jenkins, Nexus, Git, Maven)
  • Python language and building applications with it
  • Advanced understanding of complex Linux environments and applications (RedHat,Ubuntu, etc)
  • Strong understanding of Windows administration and tooling in an enterprise environment

Education or equivalentexperience withagile software development methodologies is an asset as is formal software developmenteducationor system administration background

Additional Information

Join in on what others in TD Technology Solutions are doing:

  • Inspire a positive work environment and help champion quality, innovation, teamwork and service to the business.
  • Learn voraciously, stretch your thinking, share your knowledge and educate others.
  • Communicate and collaborate with both technical and non-technical professionals.
  • Cultivate winning relationships by building trust with business and technology partners.
  • Share our commitment to productivity, effectiveness and operational efficiency.
  • Embrace change and witness amazing things happen – from the inside.