Through our publishers, Sovrn Holdings reaches over 200 million people every day, generating over 10 billion HTTP requests daily. These requests must be processed in low milliseconds and result in several petabytes of data on a monthly basis. The exciting technical problems we solve require a world class team and engineering culture to deliver against our mission.
Sovrn is currently seeking a Sr. Site Reliability Engineer to play an integral role building and maintaining our low latency, high performance, scalable infrastructure. As a Site Reliability Engineer at Sovrn you will perform hands-on engineering for full stack provisioning “DevOps” automation. Reliability engineering includes deployment pipelines, software configuration management, infrastructure configuration management, containerization frameworks,, monitoring and performance engineering. The goal of Reliability engineering is to streamline full stack delivery using automation in order to increase velocity of new features while meeting production service levels commitments. This role is a key participant in the culture to create a collaborative partnership across software, data, reliability, and quality engineering to enable the delivery and service management flows. The engineer should have experience aiding teams through DevOps transformation, implementation, and sustaining activities.
- Participate in planning and execution for full stack provisioning automation (ie continuous integration/deployment, software configuration, infrastructure configuration, and monitoring)
- Design and implement automation and tools to speed incident resolution, reduce production issues, and manage production change with minimal business disruption
- Ability to lead problem definition, solution design, and define implementation work plans
- Consult with teams on best practices for container management tools such as Kubernetes, Mesos, Docker
- Understanding of DevOps trends. Participation in community activities a plus.
- Support operational delivery and execution of key Sovrn applications.
- Develop critical software, integration, and tools needed to achieve reliability goals
- Partner with feature team engineering, data platform, data science, and external data partners to enable business deliverables
- Troubleshoot systemic issues and lead improvements
- 3+ years' experience in automation engineering role
- Understanding of DevOps and Reliability engineering
- Experience with cloud technologies (AWS, GCP, Azure)
- Experience with container management tools (Kubernetes, Mesosphere)
- An analytical mindset with problem-solving skills
- Excellent communication and collaboration skills
- Ability to understand business domain and translate to reliability services
Position Reports to: Sr. Director, Reliability Engineering