Site Reliability Software Developer

Replicon   •  

Calgary, AB

Not Specified years

Posted 238 days ago

This job is no longer available.

Position Overview 

Site Reliability Developers/Engineers (SREs) are responsible for creating and improving the tools and processes that power the building, validation, deployment, and monitoring of Replicon's globally distributed multi-tenant SaaS systems.  SREs are spreading and evangelizing the DevOps culture throughout Replicon. Working at Replicon is an opportunity to take on the unique challenges of a successful, large scale SaaS business application that existed before "web applications" as we know them today even existed.  We have unusual challenges in such as extreme data consistency & reliability (nobody puts up with being paid incorrectly), large scale, complex legacy systems, enterprise-scale customizability, all combined with a friendly non-technical user experience.  It's a "never-stop-learning" environment, where you'll be working with a strong technical team.

Responsibilities include:

  • Remove impediments in the technical development processes to support high-frequency, low lead-time software deployments
  • Isolate risk of deployment or software failures to minimize scope and impact of production incidents
  • Collaborate effectively with multiple product teams with unique internal processes and technologies
  • Identifies where development and operational practices are outside industry norms; doesn't accept "we've always done it this way", works to do it better
  • Effectively implements automation to augment or replace manual processes; prefers adopting rather than building software where appropriate
  • Participates in the on-call rotation to efficiently triage critical production incidents, reduce incident severity, and document incident in detail for product team follow-up


  • Deep understanding of web technologies (HTTP, DNS, TLS/SSL, web services, load balancing)
  • Experience with AWS (have used S3, EC2, RDS); would accept experience with Azure and/or Google Cloud
  • Experience with automation tools (eg. Docker, Terraform, Jenkins, CodePipeline)
  • Familiar with agile processes/scrum and TDD
  • Built or have supported CI and CD processes
  • Experience with post mortems/failure analysis - identifying the root cause & contributing factors of incidents
  • Experience with a SaaS, globally-distributed multi-tenant system would be a preference
  • Experience with a micro-services architecture, or the transition towards one