What you would do:- Design, build, and operate the shared platform foundations engineers ship on every day: GCP infrastructure, Kubernetes, networking, routing, CI/CD, and observability.
- Diagnose and troubleshoot complex distributed systems running at high request volume.
- Ensure observability and analyze the behavior of our stack.
- Contribute to in-flight work like modernizing our edge, caching, and gateway layers onto Fastly and tightening observability across the platform.
- Raise the reliability bar through better dashboards, alert severity, paging standards, on-call readiness, and incident response.
- Make deployment boring in the best way: build golden paths, production readiness checks, safe rollouts, and useful automation so engineers have fewer places to look before they ship.
- Mentor engineers and raise the technical bar through code review, design review, and pairing.
- Participate in our on-call rotation and help our developer on-call rollout land well.
About you:- Based in the United States, with reasonable overlap with European engineering hours.
- Experience with SRE/DevOps tools, processes, and culture.
- 5+ years of experience as part of an SRE on-call rotation.
- Analytical approach to designing, diagnosing, and optimizing infrastructure.
- Experience with managing scalable, highly available, cloud-based applications, ideally with high request volume and customer-facing uptime expectations.
- Experience with Kubernetes for orchestrating, scaling, and managing containerized applications in cloud-based environments.
- Experience building CI/CD pipelines.
- Experience with an observability stack (Prometheus, et al.).
- Comfortable working across CDNs, edge, gateways, and caching layers, or eager to go deep there.
- You improve on-call and reliability by building systems, standards, and feedback loops that make production healthier over time.
- You are comfortable dealing with incidents and outages and have built a practical, thoughtful communication style for handling high-pressure situations.
- An open but considered approach to new technologies.
There are many roads leading up to being an SRE. Our team is already a mix of self-taught and formally educated people. Don't self-select out!
What we can offer:- A highly-skilled, inspiring, and supportive team
- Real infrastructure scale and meaningful, hands-on work changing how it runs
- Positive, flexible, and trust-based work environment that encourages long-term professional and personal growth
- A global, multi-culturally diverse group of colleagues and customers
- Comprehensive health plans and perks
- A healthy work-life balance that accommodates individual and family needs
- Competitive stock options program and location-based salary