Founded in 2003, we pace the vanguard of directional retail with a mix of luxury, streetwear, and avant-garde labels. We produce cutting-edge original content and take pride in building our own technology solutions and systems from scratch. Our field of focus has grown beyond that of a typical e-commerce entity as we explore the nexus of content, commerce, and culture.
Our team is currently 1100 strong, serving 150 countries, generating an average of 76 million monthly page views, and achieving high double-digit annual growth since inception. Our work is varied and ambitious. We have a roadmap full of rewarding projects to keep you motivated and engaged, with a leadership ethos based on transparency, collaboration and enabling performance.
SSENSE is looking for a Senior Site Reliability Engineer (SRE) - Technical Lead to join our rapidly growing technology team. The Senior SRE-TL will join the SRE squad and will be responsible for keeping all user-facing services and other production systems running smoothly. The Senior SRE - Technical Lead will be accountable for the reliability, scalability and resilience of complex infrastructure components in terms of quality assurance and production. The ideal candidate will actively contribute to knowledge dissemination within the organization, participate in the recruiting and onboarding of new employees.
Team leadership, knowledge sharing & coaching - 25%
- Enforce an effective and efficient scrum process where all team members work in the same direction
- Guide SRE engineers, when needed, to break down user stories into manageable tasks
- Propose and drive a development process that emphasizes quality through code reviews, automated testing, continuous integration pipelines and documentation
- Develop a deep understanding of the team’s roadmap and influence it with fact-based technical arguments
- Ensure proper documentation of team activities
- Ensure the demo of features developed are well prepared and presented to stakeholders
- Review Pull Request, documentation with the objective to guide and upskill junior developers on various technical/SRE topics
- Provide fact-based technical feedback on each squad member to managers as part of the evaluation cycle
- Actively contribute to SSENSE University, the internal peer learning platform, to promote continuous learning
- Participate in the onboarding of new developers
- Mentor Junior in all areas and other SREs in their area of deep knowledge.
- Set an example for a team of SREs with positive and inclusive leadership and discussion on work.
- Trusted to de-escalate conflicts inside the team.
Production Operations - 20%
- Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed.
- Accountable for ensuring & improving documentation on site reliability measures, either in application documentation, or in runbooks, explaining the issues encountered and the solutions implemented.
- Actively seek and identify opportunities and implement them to improve the availability and performance of the system by applying the learnings from monitoring and observation.
- Identify parts of the system that do not scale, provide immediate palliative measures and drive long term resolution of these incidents.
- Improve the SSENSE codebase by resolving issues.
- Optimize cloud cost and reduce system resource usage by setting clear requirements through efficiency and capacity planning
Maintain Service Level Objectives (SLO)/ Service Level Indicator (SLI) - 20%
- Plan, design and execute solutions within the infrastructure team to reach specific goals agreed within the team.
- Share the learnings publicly, either by creating issues that provide context for anyone to understand it or by writing blog posts.
- Proposes ideas and solutions within the infrastructure team to reduce the workload by automation.
- Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.
- Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again.
Product delivery - 15%
- Anticipate the technical challenges the squad will face when delivering solutions and propose and implement technical solutions to those issues
- Write testable, efficient, and reusable code suitable for continuous integration and automated deployments, that respects best practices and SSENSE development standards
- Raise the bar for professional SRE engineers, lead by example, and help others learn the craft through rigorous code reviews and coaching
Ownership and accountability - 10%
- Be accountable for performance, reliability, scalability and resilience of complex and critical infrastructure components (web servers, data stores, hosted services, load balancers, etc.) through the proper use of replication, sharding, load balancing, monitoring, SLAs, alerting, and auto-scaling
- Be an active participant in the incident escalation chain and prompt resolution
- Upgrade and patch systems as required while ensuring availability of service
- Contribute to cross-squad initiatives, acting as a change agent amongst peers to foster adoption of new processes or technical solutions
Recruiting - 10%
- Contribute to the hiring process with application review or being part of the interview team to qualify SRE candidates.
- Bachelor’s degree in Computer Science, Engineering, or a related technical field, Master’s degree, an asset
- Minimum 8 years of experience working as SRE.
- A minimum of 8 years experience administrating Linux based environments (Red Hat, CentOS, Debian or Ubuntu)
- A minimum of 8 years experience with service-oriented architectures, micro-services.
- Must have at least 2 years of working in Agile development life cycle
- A minimum of 8 years experience practicing continuous integration and continuous delivery.
- Minimum 5 years of experience with infrastructure automation frameworks in at least two of these technologies:, Saltstack, Terraform, or Cloud Foundation engine.
- Expertise in infrastructure to support a microservice architecture.
- A minimum of 4 years experience in Infrastructure-as-code specifically with Terraform.
- Strong knowledge of caching technologies (Fastly, Redis) with the ability to identify opportunities for improvement.
- Expertise with RDBMS (MySql, Post-gres) and NoSQL (DynamoDB, DocumentDB, Mongo DB) databases at scale
- Proficiency in Cloud resources (AWS) with the ability to operate them for the components owned, Certification preferred.
- Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.)
- Proficiency in Git
- Must have at least 4 years of experience with Kubernetes. Nice to have Amazon EKS, ECS experience.
- Excellent written and verbal communication skills in both French and English
- Willingness and ability to learn fast.
- High work ethic and results oriented.
- High sense of accountability and ownership
- Solution-oriented mindset and can-do attitude to overcome challenges.
- Team player with a natural ability to build relationships.
- Ability to thrive in a fast-paced environment and master frequently changing Web technologies and techniques.