Reporting to the VP of Infrastructure, as Manager of Incident Response and Remediation Programs you will own the team who will create, then maintain and optimize Datadog’s global, company-wide incident response and remediation programs and processes. You will build your team from scratch - Datadog has long handled its incident response and remediation as a side function of Reliability Engineering management. However with our ongoing rapid growth both in new customers and new products, we are now seeking to build a specialized team, starting with you as their leader. The aspects of incident response and remediation your team will own are:
- Our 24/7 Incident Commander program, especially training our company’s engineers and management to be effective in the program, and working out an effective communication mechanism through to our customers; both big and small
- Our Incident Post-Mortem review and dissemination programs
- Our Reliability Risk Review and Prioritization programs
- Work with leadership and stakeholder to determine how to best staff your team to deliver its mission, then work with recruiting to hire your initial team of 3-5.
- Define & remove obstacles that slow down or prevent programs from delivering
- Establish credibility and rapport with senior technical and non-technical team members alike
- Take a hands on approach
- Communicate with senior-level executives - confident/comfortable in the presence of execs
- Be comfortable in an ambiguous environment and respond well to frequent change
- 3-5 years of experience at a SaaS company in leading teams within either technical/network/site operations, technical program management or equivalent roles.
- 7-10 years of experience in technical/network/site operations, technical program management or equivalent roles
- You can navigate technical conversations regarding cloud-based infrastructure
- Thorough understanding of the software development lifecycle; ability to adjust and apply this knowledge in a dynamic environment using agile methodologies.
- Strong quantitative and analytical skills, proven ability to track and successfully complete complex programs
- Degree in Computer Science, other engineering discipline, or Information Systems
- You’re familiar with problems that are solved by monitoring tools
- Experience working in a company in a hyper-growth stage
- Experience growing a team (identifying roles, partnering with recruiting to hire)