Senior Software Engineer - Site Reliability (Bellevue, WA)
The New York Times describes Thunder as "an ad engine to put Mad Men out of business." We're changing how digital ads are created, personalized, and optimized with our intelligent Creative Management Platform
Why is Site Reliability Engineering important to Thunder?
Site reliability engineers (SREs) are responsible for the overall reliability of Thunder infrastructure and products. SREs design and implement the tools that automate building reliable systems.
What does a Site Reliability Engineer do?
- Advocate for reliable design patterns (circuit breakers, graceful degradation, etc.)
- Automate as much as humanly possible
- Figure out what is going to break and when
- Work with software engineering teams on design and implementation choices of large scale distributed systems
- Always configure as code
- Bring ideas to life (i.e. production)
What is something you might work on?
- Standardizing core infrastructure components so they have best practices (monitoring, alerting, etc.) built in for free
- Building out monitoring and alerting systems
- Assisting in architectural updates to scale services
- Investigating traffic and load spikes and executing to handle or remove them
The following experience is relevant to us:
- Strong focus on correctness, simplicity and maintainability
- The knack for writing, clean, readable, maintainable code
- Knowledge of AWS tools and services
- An eye for automation and instrumentation
- The ability to decompose complex systems and find failure scenarios
- Experience bringing software to production at high scale
- Very strong written and verbal skills
- Contributions to open source software