Director of Site Reliability Engineering
We are looking for a Director who exemplifies the attributes of a leader, mentor, decision maker, and engineer. Site Reliability Engineers (SRE) work at the intersection of software and systems engineering to design and build large scale distributed systems which are reliable, operable, secure, highly available, disaster-ready, and performant. Join a world-class engineering team and utilize your programming and operations talents to apply the latest patterns in continuous delivery, application/kernel tuning, distributed architecture, and real-time data analysis to help us build EdgeCast, the next generation Content Delivery Network.
• Implements a top-notch continuous improvement process that includes root-cause analysis, solution identification and implementation, and ongoing emphasis on auto-remediation
• Fully responsible for achieving the defined SLO and SLAs
• Responsible for building best practices and presents proposals for SRE strategy and direction.
• Proactively defines changes to system architecture to improve system performance and scale.
• Collaborate with Product Management and Engineering leaders to ensure that operational and
reliability requirements are clearly articulated and integrated into the product roadmap.
• Foster the engineering culture on existing teams, emphasizing empathy, collaboration,
innovation—and you guessed it—reliability!
• Lead the department that has technical responsibility for the foundational platforms powering
• Responsible to identify points of failure and work with the engineering team to build resiliency as table stakes in our services
• Previous experience running production-grade software at scale and an appreciation for the complex and emergent behaviors inherent to distributed systems.
• Experience leading, motivating, and mentoring fast moving, highly-skilled infrastructure engineering teams; adept at navigating and improving both social and technical systems.
• A creative problem-solver who thinks about how changes impact systems holistically and who will demonstrate persistence and resourcefulness when obstacles arise.
• Someone comfortable identifying technical and process-related shortcomings, and who can lay out a vision to fix them, and isn't afraid to institute change by experimentation.
• Prior experience managing people strongly preferred.