Incident Manager

  •  

Atlanta, GA

Industry: Software

  •  

8 - 10 years

Posted 292 days ago

  by    Scott Hall

This job is no longer available.

The Incident Response Manager is responsible for establishing and managing processes to restore normal service operation as quickly as possible and provide support for ongoing data or system related Incidents and the long-term remediation of Incident root cause. This role is responsible for maintaining detailed records of all Incidents, capturing root cause, and ensuring problem resolution to minimize the adverse impact on business operations and identify causes of service issues and commission corrective work to prevent recurrences.

Responsibilities

  • Incident resolution and problem management: conduct Root Cause Analysis (RCA), capture action items, and conduct remediation follow-up meetings
  • Proactively monitor applications and systems in order to capture, triage, and escalate Incidents to technical teams prior to outages or end-user escalation
  • Review Incident trends; identify repeating issues; perform root cause analysis and implement process changes to reduce and ultimately eliminate recurrence
  • Direct and manage escalation and resolution calls with members from various teams
  • Communicate progress and resolution messages to appropriate stakeholders including client communications about root cause of major Incidents and follow-up
  • Design and present detailed reports of Incidents from detection through resolution
  • Track and maintain service availability and performance metrics and use them to prioritize and isolate issues
  • Track process efficacy using established Key Performance Indicators (KPIs)
  • Collaborate with team members to improve the Incident management process and problem management processes, establish new KPIs as needed
  • Develop communication templates and ensure the timely dissemination of system impacting events and information to the Enterprise
  • Manage relationships with 3rd party vendors

Qualifications

  • 7+ years?experience leading analysts and engineers in an enterprise IT incident management and response environment, preferably supporting multi-site retail systems
  • 5+ years' experienceworking with and improving processes within a standard ITSM framework and good knowledge of general operations related to enterprise applications, servers, networks, databases, etc.
  • 4-year college degree or significant relevant experience
  • 5+ years of experience with Java and SQL/Oracle Databases (preferred)
  • 5+ years of experience with ETL and data integration environments
  • 3+ years of application performance monitoring and tuning (i.e. App Dynamics preferred)
  • Ability to identify the data related issues and categorize them as application related gaps and drive to a quick fix as well as permanent fix by raising the problem tickets
  • Ability to effectively manage relationships with business and technology team partners including: end-users, developers, enterprise architecture, quality assurance, network & infrastructure
  • Hands-on experience and expertise with enterprise-level monitoring and troubleshooting software
  • Successful track record of proactively monitoring applications and networks in order to identify issues before they are experienced by end-users
  • Strong analytical, organizational, and problem-solving skills with a passion for root cause analysis and process improvement.
  • Excellent communication skills with the ability to communicate clearly and effectively via writing and speaking
  • Strong customer service orientation with a focus on managing and exceeding customer expectations
  • Advanced planning and organizational experience within fast-paced/dynamic business environments
  • Must have excellent professional references from recent supervisors and team members

$80K - $100K