Incident Recovery Manager

Perficient • $81K — $149K *

Charlotte, NC 28269In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years of experience in Production Services, Recovery, and Problem Management with a background in development.
Strong knowledge of infrastructure, cloud, and application technology components.
Hands-on experience with observability tools and logs for troubleshooting.
Familiarity with cloud platforms (Azure, AWS, GCP) and disaster recovery architectures.
Experience with ITSM tools like ServiceNow and automated incident management workflows.
Proven success in large-scale enterprise production support.
Understanding of CI/CD pipelines and their impact on production stability.

Responsibilities

Lead cross-functional teams to address major incidents and conduct post-mortem reviews.
Conduct root cause analysis of incidents using various techniques like Blameless RCA.
Develop and test incident recovery plans, creating SOPs and knowledge bases.
Create playbooks to enhance self-sufficiency in diagnosing issues.
Identify process improvements for IT service reliability and operational risk reduction.
Provide feedback to teams on enhancing observability and addressing failure points.
Mentor team members and promote a culture of ongoing learning and professional growth.

Benefits

Opportunities for continuous learning and professional development.
Collaborative work environment with cross-functional teams.
Engagement in cutting-edge projects involving cloud modernization.
Involvement in driving innovation and operational excellence.
Potential for career advancement within a dynamic organization.

Full Job Description

Job Description

We currently have a career opportunity for an Incident Recovery Manager to join our team located in Charlotte, NC.

Job Overview:

The Incident Recovery Manager will lead the recovery of critical incidents following major disruptions and drive strategies to ensure effective, timely restoration of services. This role is responsible for managing major incident recovery, leading post-mortem reviews, developing SOPs and runbooks, and implementing proactive measures to prevent future failures through the adoption of SRE principles.

As a dynamic and motivated leader, you will play a key role in shaping and enhancing production support and operational capabilities. You will help identify and implement best-fit solutions that improve stability, reliability, and efficiency across the organization.

This is an opportunity to drive meaningful change and enhance the end user experience. If you are passionate about production operations, stability, SRE, and observability, and have a proven track record of success, we invite you to join us in advancing resilience and operational excellence.

Responsibilities

Major Incident Support: Drive cross-functional teams to resolve critical incidents and attend post-mortem/post-incident reviews.
Root Cause Analysis (RCA): Investigate underlying causes of major incidents, utilizing techniques like 5-Why, Fishbone, Blameless RCA and other techniques
Recovery Strategy & Planning: Develop and test incident recovery plans, establish SOPs, knowledge base and mock drills
Self Sufficiency: Develop playbooks by coordinating with domain owners and ensure more self-sufficiency and diagnosis accuracy
Process Improvement: Identify opportunities to improve IT service reliability and reduce operational risks related to people, process and technology
Feedback Loop: Provide continuous feedback to Observability, Automation, Resiliency and Domain teams on improving observability posture, automation, single points of failures, architectural and design gaps
Training and Development: Mentor and develop other team members, providing training. Stay current with industry best practices and technologies, fostering a culture of continuous learning and professional growth.
Performance Monitoring & Analytics: Utilize analytical and technical skills to assess system performance, monitor incident trends, and drive continuous improvement initiatives.
Cross-Functional Collaboration: Collaborate closely with engineering teams, and third-party vendors during major incidents and on system design, feasibility, and architecture to improve stability and meet resilience objectives.

Qualifications

Progressive and proven experience and expertise in Production Services, Recovery and Problem Management, SRE, DevOps, or related fields with development background preferred
Strong understanding of foundational technology components across infra, cloud and app to be able to diagnose, ask right questions and effectively lead recovery of a critical incident
Hands-on experience with observability tools, logs and diagnostics to be able to troubleshoot and coach people
Experience with cloud platforms (e.g., Azure, AWS, GCP) including high availability and disaster recovery architectures
Experience with incident management and ITSM tools (e.g., ServiceNow, PagerDuty, Opsgenie) and automated workflows
Demonstrated success in complex, large-scale enterprise production support and operations environments, including experience working with large geographically distributed teams
Understanding of CI/CD pipelines and deployment strategies (e.g., blue-green, canary) and their impact on production stability
Excellent communication and interpersonal skills, with a focus on collaboration and relationship-building
Able to communicate effectively with CXOs and convey complex technical details into business terms
Ability to influence and drive change across the organization
Analytical mindset with the ability to translate data into actionable insights
Experience in analyzing incident trends and implementing process improvements to enhance operational efficiency
Strong decision-making skills under high-pressure incident scenarios with the ability to balance speed and risk
Proactive mindset with a focus on prevention, continuous improvement, and operational excellence
Strong ownership and accountability with a bias toward action and results
Ability to mentor, coach, and elevate technical and operational capabilities across teams

ABOUT THE TEAM

Our App Modernization team helps businesses transform legacy systems and build future-ready applications. We deliver end-to-end solutions-combining cloud migration, custom application development, multi-cloud strategies, and modern UI and API integration. With expertise in DevSecOps, modern frameworks, and enterprise platforms, our team of engineers, architects, and project leaders partner with leading brands to drive innovation, accelerate delivery, and create lasting business impact. We also integrate AI-driven capabilities-such as intelligent automation, predictive analytics, and generative development tools-to enhance scalability, performance, and user experience.

ADDITIONAL INFORMATION

Applications will be accepted until the position is filled or the posting is removed.

The salary range for this position takes into consideration a variety of factors, including but not limited to skill sets, level of experience, applicable office location, training, licensure and certifications, and other business and organizational needs. The new hire salary range displays the minimum and maximum salary targets for this position across all US locations, and the range has not been adjusted for any specific state differentials. It is not typical for a candidate to be hired at or near the top of the range for their role, and compensation decisions are dependent on the unique facts and circumstances regarding each candidate. A reasonable estimate of the current salary range for this position is $81978 to $149880. Please note that the salary range posted reflects the base salary only and does not include benefits or any potential variable compensation programs. Information regarding the benefits available for this position are in our benefits overview.

#LI-MG1

About Perficient

Perficient is a leading digital consultancy that helps companies transform their businesses and operations through technology. They deliver solutions to clients that range from Fortune 500 companies to emerging businesses. Perficient has a broad range of capabilities, including strategy, design, technology, and operations. They have expertise in a variety of industries, including healthcare, financial services, retail, and energy. Perficient has been recognized as a top employer and a top company for women technologists. They are committed to giving back to their communities through philanthropy and volunteerism.

Learn more about Perficient

Size

6,079 employees

Market Cap

$2.4 billion

Industry

Information Technology

Net Income

$30.1 million

Founded

1998

5 Year Trend

+9.3%

Revenue

$612.1 million

NASDAQ

PRFT

* Ladders Estimates

Similar Jobs

Service Support Manager
$79K — $125K *
City of Suffolk, VA
Suffolk, VA 23434 (Suffolk City County)
Yesterday
Service Support Manager
$79K — $125K *
City of Suffolk, VA
Washington, VA 22747 (Rappahannock County)
Yesterday
Mgr-Technology Delivery
$100K — $130K *
Ameriprise Financial, Inc
Charlotte, NC 28269 (Mecklenburg County)
2 days ago
Remote Technical Account Manager
$70K — $95K *
Xperteks
Remote
4 days ago
Senior Manager, Platform, Lifecycle, & Troubleshooting
$120K — $140K *
Vultr
Remote
5 days ago
Manager, Platform Enablement
$115K — $140K *
TrueML
Remote
6 days ago

Get Ready For Your
Next Interview

More Jobs at Perficient

Incident Recovery Manager
$81K — $149K *
Charlotte, NC 28269 (Mecklenburg County)
Today
Information Technology
In-Person
Senior Linux Platform Engineer - Santa Clara
$64K — $149K *
Santa Clara, CA 95051 (Santa Clara County)
Today
Healthcare
In-Person
Senior Linux Platform Engineer - Santa Clara
$64K — $149K *
Irvine, CA 92620 (Orange County)
Today
Healthcare
In-Person
Enterprise Architect
$81K — $178K *
Somerville, MA 02145 (Middlesex County)
Today
Enterprise Technology
In-Person
Enterprise Architect
$81K — $178K *
Boston, MA 02115 (Suffolk County)
Today
Enterprise Technology
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
2 weeks ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
2 weeks ago
Database Administrator
$80K — $110K *
Freedom Technology Solutions Group
Saint Louis, MO 63129 (Saint Louis County)
Today
Systems Engineer (Performance Engineering)
$90K — $120K *
AutoZone
Memphis, TN 38109 (Shelby County)
Today

Find similar Incident Recovery Manager jobs:

Nationwide Charlotte, NC

Incident Recovery Manager

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Incident Recovery Manager jobs:

Get Ready For Your
Next Interview