We are looking for a Site Reliability Engineer to work with us to create exceptional new products for airlines and travelers. The team of site reliability engineers is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning. The team will work at least 50% of the time in prevention and at most 50% of the time in support of SaaS Plusgrade. This role will be based in Montreal and will be part of our product engineering organization.
We firmly believe that there are many exciting opportunities in technology and in the travel industry, if you are enthusiastic about meeting these challenges with us, we would like to meet with you. As an SRE your typical week will include the following tasks:
- Analyze and examine current applications in order to identify weak points in terms of quality.
- Partner with engineers to lead the necessary architectural changes.
- Help support rotation for 24/7 coverage within the engineering organization.
- Help influence the budget for engineering technical debt in order to comply with SRE best practices.
- Collaborate with the platform engineering team to make improvements to our internal platform.
- Help improve the entire life cycle of operational readiness.
All our SREs work within a multidisciplinary team of software engineers and platform engineers who are committed to improving Plusgrade products.
- You have a good knowledge of distributed systems.
- In-depth knowledge of best surveillance and alert practices.
- Have experience working with a public cloud, namely AWS.
- Have experience working with infrastructure tools such as Terraform, and Docker.
- Have experience working with a monitoring application with APM, namely Datadog.
- Minimum 3 years experience in creating web applications with Java.
- In-depth knowledge of the JVM and its diagnosis.
- Over 2 years of experience on projects including deep technical dives and production troubleshooting in the areas of distributed systems, code, networking, storage and operating systems.
- Over 2 years of experience in software support, reliability or operations engineering in a highly customer-focused environment.
- Hold a bachelor's degree in computer science / engineering or equivalent experience.