Site Reliability Engineer(SRE)

  •  

Toronto, ON

Industry: Information Services

  •  

5 - 7 years

Posted 381 days ago

  by    Joshua David

This job is no longer available.

Skills for SRE Specializations (SRE-SWE and SRE-SE)

SREs have diverse backgrounds such as software development and systems administration, from which their experience is often biased towards software engineering (SRE-SWE) or systems engineering (SRE-SE). We strongly value the breadth and depth of skills and diversity of thinking this brings to our team.

While it is likely that no SRE will possess all of the skills on this page we seek candidates who have the required core skills, will master some, and are willing to learn the vast majority.

SRE-SWE

· Object-Oriented design, design patterns and programming following clean coding practices.

· Agile/lean development practices such as Scrum, XP and agile design.

· Data structures and algorithms.

· Software testing frameworks that support TDD and BDD.

· Automating software build and testing using tools such as Jenkins.

· Database programming, schema design and query optimization (relational and NoSQL).

SRE-SE

· Writing code to drive system engineering activity such as system testing, load generation, instrumentation, log analysis, performance monitoring, error simulation and deep discovery of system properties.

· Conducting investigation across any system component and related systems to discover and rectify performance bottlenecks and sources of unreliability.

· Applying scientific principles of experimentation and measurement to system components to identify improvements to the configuration and architecture which improve reliability, performance and operability.

· Network flow analysis and troubleshooting.

· Selection, design and tuning storage systems for reliability and performance.

· Configuring, analyzing and tuning (relational and NoSQL) database systems to improve reliability and performance.

· Configuring and tuning web servers, application containers, message queueing systems and other middleware to improve reliability and performance.