Manager, Site Reliability Engineer

Snap   •  

Seattle, WA

Industry: Technology

  •  

5 - 7 years

Posted 40 days ago

Snap Inc. is a camera company. We believe that reinventing the camera represents our greatest opportunity to improve the way people live and communicate. Our products empower people to express themselves, live in the moment, learn about the world, and have fun together.

We’re looking for a Manager to lead our Site Reliability Engineers at Snap Inc! This is a ground-floor opportunity to grow a new Site Reliability Engineering team to define the future of reliability at Snap. As a member of the Infrastructure Engineering Team, you will help design and operate the next generation of Snap’s multi-cloud architecture. Working from our Seattle, WA office, you’ll collaborate across teams to establish engineering strategies to improve Snapchat's reliability and scalability. You’ll lead a team to build operational tools and deliver automation that will be used by SRE as well as the rest of Snap engineering. In addition to improving Snap services, this is also an opportunity to contribute to the overall culture and strategies around service operations and reliability here at Snap (incident response, post-mortems, trend analysis, availability standards). This is a high-visibility role that will greatly impact the quality of our service used by millions around the world.

What you’ll do:

  • Design, operate, and improve our most critical services
  • Participate in operations along with engineering team on-calls, helping to debug, improve, and optimize critical backend services
  • Work across teams to understand system requirements, evaluate trade-offs, and deliver the solutions needed to build reliable services
  • Identify scaling bottlenecks and help Snap services scale to meet user demand
  • Perform design, code, and process reviews to improve individual systems as well as
    Engineering-wide
  • Help make our team better by contributing to design and launch reviews for new services
  • Advocate for and apply best practices when it comes to availability, scalability, operational excellence, and efficiency

Minimum qualifications:

  • Bachelor’s degree in a technical field such as computer science or equivalent experience
  • 5+ years of software development experience
  • Experience or proficiency in one of Java / Go / C++
  • Experience with backend services, distributed systems, or Linux internals

Preferred qualifications:

  • Interest in operational excellence, availability, and automating away manual tasks
  • Passionate about problem-solving with strong technical communication skills and desire to collaborate with others
  • Experience operating large-scale distributed systems, microservice architectures, or multi-tenant systems
  • Hands-on experience using AWS or Google Cloud services
  • Experience with NoSQL storage solutions and Memcache/Redis
  • Experience with Kubernetes, Envoy, and related software a plus