Senior Site Reliability Engineer - Cloud Infrastructure Team

GrubHub   •  

New York, NY

5 - 7 years

Posted 240 days ago

This job is no longer available.

About the Opportunity: 

The Cloud Infrastructure (CI) team at Grubhub is responsible for building the frameworks and platform on which all services are built and operate along with building a number of key services. This team is at the core of implementing distributed, scalable system foundations and designing for 5 9’s uptime and horizontal scalability. The team is made up of Software Engineers and Site Reliability Engineers (SRE). The SREs focus on helping the team build reliable distributed systems that are easy to run, maintain and operate. The CI SRE team works across all teams at Grubhub to assist with design and architecture as well as providing guidance on the platform we have built. The CI SRE team also builds and maintains our deployment and self-service tooling and services, along with our Platform SRE team

Some Challenges You’ll Tackle


• Own the platform services which include service discovery (Eureka), application routing and service configuration management

• Help build multi datacenter, performant and highly available services, and the frameworks to support them

• Build solutions to empower Software Engineers to make (safe) changes across our production environment. This includes our continuous delivery pipeline, command line tools, services, et cetera

• Actively contribute to the adoption of strong software architecture, development best practices, and new technologies. We are always improving the process of building software; we need you to help contribute


Tools we work with:

• Python and Java

• Cassandra

• Docker

• Jenkins and Spinnaker

• Datadog and Splunk

You Should Have

• Minimum 4+ years experience building complex distributed systems. In this role you are the one gravitating toward operational concerns of the team, focusing on reliability, performance, capacity planning and automation of everything.

• Minimum 4+ years building applications in Python (or a similar language)

• Deep knowledge of distributed systems; including distributed algorithms, service discovery and consistency models

• Experience with the JVM

• Deep understanding of Linux systems

• Experience with NoSQL databases such as Cassandra

• Exceptional communication and troubleshooting skills.