Site Reliability Engineer
at ThousandEyesSan Francisco, CA
The name ThousandEyes was born from two big ideas: the power to see things not ordinarily possible and the ability to collect insights from a multitude of vantage points. As organizations rely more on cloud services and the Internet, the network has become a "black box" outside of their control. ThousandEyes gives organizations visibility and insight into the now borderless network. It arms them with an accurate understanding of how the network impacts their applications, users and customers. ThousandEyes is used by some of the world's largest and fastest growing brands, including 4 of the top 5 SaaS companies, 4 of the top 5 US banks and 3 of the Fortune 5. ThousandEyes is backed by Sequoia Capital, Google Ventures, Tenaya Capital and Sutter Hill Ventures, with headquarters in San Francisco, CA.
Engineering at ThousandEyes
At ThousandEyes, we use cutting-edge technologies and innovative techniques to study and visualize networks on a global scale.
ThousandEyes engineers are focused on continuous improvement — of our product, our codebase, our knowledge, and our skills. We believe in innovation, simplicity, and elegance. We work in small, cross-functional teams where everyone has a voice.
About the Role
Are you ready to put your hands in production? To manage clusters that receive hundreds of millions of messages per hour? Are you excited to manage thousands of containers?
ThousandEyes Site Reliability Engineers are responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. The job is to handle the company’s core infrastructure services, maintaining a constantly growing infrastructure capable of handling a very high volume of incoming data per day.
We believe in operations/infrastructure/everything as code which makes our small, distributed team efficient, functional and very effective.
We’re looking for talented engineers with a software or operations background, experienced in designing, analyzing and troubleshooting large-scale high available distributed systems. You must be willing to work closely with our application development teams to ensure the reliability and performance of our infrastructure.
Apply for this role if you:
- Are a fast learner
- Are comfortable working with new technologies
- Are exceptional with Go or Python and familiar with algorithms, data structures, and complexity analysis
- Are exceptional with Unix/Linux systems, with experience working with the shell, the kernel, system libraries, file systems, and client-server protocols
- Have experience with network protocols and theory (TCP/IP, UDP, ICMP, MAC addresses, IP packets, DNS, OSI layers, load balancing, etc.)
- Have experience with configuration management systems
- Have the will to contribute to open source projects
Bonus points if you have experience with or are knowledgeable about stream processing, complex event processing, microservices, or storage solutions, or if you have worked with any of the other technologies in our stack.
Our infrastructure technology stack: