We are looking for a talented Applications Site ReliabilityEngineer (ASRE) to work alongside our applications developers to help them architect, deploy and operate the next generation of Fastly’s production service platform. Site ReliabilityEngineering (SRE) drives the resiliency, speed, efficiency, scaling and security of our backend systems.
Each ASRE is embedded in an application team for a period of 2-12months, where they consult with and assist that team on all axes of reliability and production readiness. This embedding model allows the ASRE to get depth in a particular Fastly application and cultivate ongoing relationships with that application engineering team. In this role you'll have the opportunity to gain technical breadth while sharing your expertise.
You will be part of the main-line of the product, not a sideline.
- Design, build and operate infrastructure to enable reliable and rapid deployment, effective monitoring, and resilient operation in a large-scale Linux environment.
- Diagnose and resolve performance and reliability issues across the entire stack: application, operating system, network, kernel, firmware, hardware, including cross-application dependencies.
- Educating developer teams
- Write tools to automate maintenance and deployment of machines, services, applications.
- Work closely with your development teams to ensure that services are designed with scale, operability, performance, and ease-of-use in mind.
- Build and maintain a robust Continuous Integration environment.
Keys to Success
We value a variety of voices, so this is not a laundry list. If you have experience and/or interest in SOME of the following, you should apply!
- 4+ years experience running high availability systems and supporting distributed infrastructure.
- High value for distributed teams. Our SRE team is one of the most distributed in the company, so working collaboratively with folks who aren't in the same room as you is a must.
- Strong understanding of Linux systems, high and low level.
- Useful knowledge of shell scripting and one or more scriptinglanguage (e.g. Python, Ruby, Perl).
- Experience with compiled languages (Go, C/C++, Java).
- Good understanding of configuration management best practices and standards.
- Experienced with cloud providers such as Amazon Web Services and Google Cloud Platform.
- Adaptable to a wide variety of technologies and people.