Site Reliability Engineer (SRE) in Bothell, WA

$80K - $100K(Ladders Estimates)

Evernote   •  

Bothell, WA 98011

Industry: Consumer Technology


Not Specified years

Posted 38 days ago

Our SRE team is responsible for the overall performance and reliability of Evernote's service and products. This includes over 200 million passionate and engaged users around the world, with billions of notes and files. We are looking for a Site Reliability Engineer to help us in the ongoing mission of delivering an outstanding service to our users.

We participate in all aspects of running our platform at scale, focusing on both the service as it runs today and ensuring we can deliver new and exciting features rapidly to users. We have a real passion for automation and we continually seek to improve. We work hand-in-hand with product teams to help them ship production-ready services and get new features in our users' hands. We use Service Level Objectives (SLOs) based on Key Performance Indicators (KPIs) for each of our services and use them to allow us to move quickly while maintaining the quality service our users expect.

What you'll do

  • Work closely with engineering teams to maintain and scale our existing production platform
  • Help us evolve what it means to be an SRE at Evernote
  • Evolve and implement production readiness standards for new services
  • Champion our SLOs and look to continuously improve them
  • Develop and maintain automation to reduce operations toil for the team
  • Participate in an on-call rotation for our production services

What we're looking for

  • You possess a contagious sense of ownership and the tenacity to always find a way
  • You focus on quality to build manageable, scalable, and maintainable systems
  • You know that perfection is the enemy of done and when to make trade-offs
  • You emphasize the importance of making decisions based on data
  • You enjoy solving tough technical problems
  • You exercise judgement in a way which reduces risks
  • You share enthusiastically to reduce disconnects and communication breakdowns
  • You always want to understand the why in order to better see patterns and improve quality

What you've done

  • You know Linux systems like the back of your hand
  • You've managed production environments at scale in a public cloud environment (AWS or GCP)
  • You have a strong familiarity with web applications including MySQL, Java, Apache
  • You've attained a deep understanding of networking protocols (e.g. TCP/IP, HTTP, DNS, etc)
  • You've implemented and used third-party metrics and monitoring platforms such as DataDog and PagerDuty
  • You possess the ability to wrangle problems quickly using the tools available at your disposal
  • You've used configuration management and orchestration tools and you understand why they're important
  • You've built extensible and maintainable automation (Shell, Python, or Go preferred)
  • You've run containerized microservices using Kubernetes

Skills that are particularly meaningful to us

  • Google Cloud Platform: GLB, Pub/Sub, Spanner, GCS, App Engine, and GKE
  • Monitoring: PagerDuty, DataDog, Splunk
  • Tools: Ansible, Puppet, Helm, Jenkins, Cloud Deployment Manager, Terraform
  • Infrastructure: HAProxy, Envoy, ElasticSearch, Consul
  • Languages/Libraries: Go, Python, Java, Thrift, gRPC

Valid Through: 2019-10-31