Scotiabank has begun the journey to modernize both development practices and tools. One of the areas being explored is the public cloud and the various platform technologies that support both development and operations on the cloud.
The aim is to reduce costs by providing a streamlined process and framework which will allow the development team to focus on building business logic.
We are looking to build our development team with influencers, makers, creators and industry leaders who will drive us forward and enhance the experience of our customers.
Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production?
Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability Engineer will allow you to have the opportunity to combine your technical ability, strategic thinking and provide detail-oriented execution in a fast-paced, dynamic environment.
You will join a new team with the purpose of transforming both development and operations, having full access to fix and strengthen code, designing systems that validate and run code from other teams, and designing tools that monitor the state of our systems.
The team has a strong focus on being people first and promotes support and training on both an individual and a team level.
THINGS YOU WILL DO:
• Manage a critical platform in the bank that is expected to run hundreds of applications
• Improve and maintain site availability, scalability, service and system performance
• Investigate system errors and problems, bottleneck analysis of the system at scale, etc.
• Setup monitoring systems and application metrics as well as supervise them for prediction/detection of failures
• Design and develop software in code testing automation and code deployment
• Provide solutions for performance management, disaster recovery, monitoring and access management
• Provide training programs for both internal and external team members
• Be part of on call rotation
YOUR BACKGROUND AND SKILLS INCLUDE:
• Excellent knowledge and experience in Software Engineering, System Administration, and Operations
• Understanding of Unix/Linux systems from kernel to shell and beyond, including internal Unix systems and networking (DNS, TCP/IP, UDP, etc)
• Experience designing and implementing tasks in Continuous Integration systems (Jenkins, Travis, CircleCI, etc.)
• Strong grasp of security, privacy and monitoring concepts
• Experience supporting containers, container orchestration platforms
• Experience operating applications on public and private cloud solutions
• Experience with running large scale systems and meeting SLA expectations
• Strong sense of project ownership and team responsibility
• 5+ years of relevant working experience and at least 3 years in a DevOps / Site Reliability Engineer role
• Excellent project management skills and the ability to work in a fast-paced and hectic work environment
• Solid verbal and written communication skills
IT WOULD BE GREAT IF YOU ALSO HAVE:
• Experience with Azure, AWS, or GCP
• Excellent knowledge of network engineering
• Well versed in database management
• Experienced with security in the cloud: Intrusion detection, penetration testing, and vulnerability scanning
Requisition ID: 22194