The Senior Manager ofIncident and Escalation Management will be responsible for leading a world-class incident and escalation management program which is part of the Global Support Organization. You will lead and grow the Splunk Support Incident and Escalation Management (ICEM) Team, advocating on behalf our Splunk customers during incidents and outages by partnering with other Splunk Teams such as engineering, operations and product organizations. In addition to RTB activities, we are looking for the right person to take the ICEM team to the next level as Splunk continues to grow and focus on delivering best in class customer experiences.
This role requires an understanding of an industry standards such as ITIL, ITSM, experience with various methodologies, tools, and a keen ability to rally partners and proactively problem-solve. The successful individual will also be key to establishing new opportunities for engagement and working to establish a shared problem management function.
Responsibilities:
- Strategy, Scale, and Process
- Own and manage the incident and escalation processes within the Support Organization to drive the restoration of services quickly for customers while minimizing the impact.
- Partner with leaders of Cloud Operations and Product Engineering teams to ensure end-to-end workflows and process alignment.
- Establish and implement relevant metrics and measures for success
- Own and continue to evaluate and improve processes / mechanisms globally, to ensure flawless experience.
- Develop a 3-year vision for this team, not necessarily limited to Support Incident and Escalations as defined today
- Build future-proof processes and automation to enable scale across teams and products
- Work closely with other team members and departments to establish a closed loop system for eliminating defects that caused the customer concern at the source to prevent future issues (problem management). A key measure will be the elimination of repeat issues of the same type
- Drive root cause analysis and corrective action completion to help eliminate disruption of services and consequently improve the day-to-day operations of the organization using validated problem analysis methodology and tracking all elements of the RCA to closure.
- Ensure quality work such as customer-facing Root Cause Analysis (RCA) documents, senior executive readouts, After Action Reports (AAR’s).
- Act as a leader with vision within the company and remain up-to-date on industry standard methodologies
- Leadership, customer focus, influence and communication
- Lead and grow a diverse team responsible for responding to, investigating, managing, and resolving high-impact incidents and escalations 24x7x365.
- Coordinate efforts across multiple teams in order to ensure an effective incident response capability
- Build effective reporting for multiple audiences, delivered with appropriate level of detail, and at a predetermined schedule
- Develop processes, partnerships, and resource plans with internal teams (e.g. Development) to ensure immediate action is being executed on escalated issues
- Ensure global teams act as one, delivering seamless handoffs, including over holidays
- Forge tight partnerships (and processes) with internal leaders (and their respective teams) to ensure delivery of a seamless process with customers.
- Strong executive presence
- Clearly explain highly technical issues to a non-technical audience
- Insight and Action
- Identify new insights from incident and problem data to help focus future efforts on product and service stability
- Deliver consistent, best-in-class metrics for incidents/escalations across response time, resolution time, customer satisfaction, and process or technology fixes from root causes
- Design and deploy a mechanism for tracking and reporting status of escalations that can be shared with customers and internal groups. Automate process and alerting.
- Build a new mechanism for early detection of potential escalations before they occur including an intervention process which will reduce escalations as a % cases over time
Requirements:
- 10+ years demonstrated experience supporting and troubleshooting commercial end user software applications, preferably supporting enterprise level, mission-critical applications. 5+ years in crisis management preferred
- 3+ years supporting Enterprise Software and/or Cloud based technologies
- Customer focus
- 5+ years leading and developing teams responsible for critical issues and/ or customer concerns
- Demonstrated results in delivering consistent, best-in-class results in terms of responsiveness, resolution, and customer experience
- Proven, tenured experience leading global incident management teams in a management role
- Demonstrated strategic and tactical thinking, quantitative and analytical skills, while under pressure
- Ability to influence and persuade without formal authority
- Demonstrated ability to quickly adapt in fast-paced, changing environments
- Exceptional bias for action – willing to move rapidly and decisively to resolve customer issues
- Ability to work global hours including weekends and holidays as needed
- B.S in Computer Science or other equivalent technology required
Preferred Qualifications:
- Expertise in incident management industry-standard best practices, including ITIL
- Experience working across geographies and functional teams (e.g. Engineering, Product Management, Operations, Customer Success, Global Support)
- Experience/Solid understanding of: Operating systems such as Linux, UNIX or Windows
- Prior experience as a Technical Support or Service Engineer
- Splunk
- Tools such as JIRA, SFDC, ServiceNow, Slack, Incident orchestration and automation tools (ie VictorOps etc)