Position OverviewAs a Software Engineering Manager for PNC's Site Reliability Engineering Center, you will work within PNC's Information Technology Group and be located at one of our IT Hubs: Cleveland, Ohio; Birmingham, Alabama; Pittsburgh, Pennsylvania; Dallas, Texas; Denver, Colorado or Phoenix, Arizona and manage the twilight shift.
The Site Reliability Center (SRC) is focused on establishing a culture of operational excellence by ensuring infrastructure, platforms, and applications adhere to SRC onboarding standards that improve reliability, enable proactive issue resolution, and reduce customer impact. This role supports the vision of building a collaborative technology organization across application, infrastructure, and security teams to deliver a stable, reliable, and secure environment. Key responsibilities include driving customer-centric service improvements, implementing proactive and preventative reliability practices, fostering cross-functional collaboration, enhancing monitoring and observability capabilities, promoting a blameless culture of continuous learning, and reducing operational toil through automation. The ideal candidate will help improve service performance, strengthen operational resiliency, and advance automation and observability initiatives that enhance the overall customer experience.
As a Software Engineering Manager – Site Reliability Engineering (SRE), you will lead a team responsible for ensuring the reliability, scalability, and operational excellence of mission-critical platforms that power PNC’s digital experiences. This role blends technical leadership, hands-on problem solving, and people management, driving both production stability and continuous improvement across complex distributed systems. You will….
• Manage SRE and related Teams; lead, coach, and develop a team of SRE engineers; set clear goals, drive accountability, and foster a culture of ownership and excellence; partner with cross-functional stakeholders to align technology and business objectives; support talent development, performance management, and succession planning; encourage innovation, continuous learning, and DevOps/SRE best practices.
• Lead incident management & remediation; manage and actively participate in end-to-end incident response for major (P1/P2) incidents; guide real-time triage, diagnostics, and troubleshooting across application, infrastructure, and network layers; ensure rapid execution of remediation actions and service restoration; provide clear, timely communication to stakeholders during incidents; oversee post-incident analysis, reporting, and documentation to drive improvements.
• Provide technical leadership in production support; serve as an escalation point for complex production issues; guide troubleshooting across: applications, infrastructure (Linux/Windows), databases (Oracle, SQL), middleware and integrations; ensure efficient log, metric, and system analysis; oversee batch/ETL monitoring and recovery processes; foster strong collaboration across engineering, infrastructure, and vendor teams.
• Drive problem management & root cause resolution; lead root cause analysis (RCA) efforts for major and recurring incidents; ensure ownership and resolution of problem records; drive permanent fixes and systemic improvements to eliminate repeat issues, identify trends and patterns to reduce risk and improve stability; partner with engineering teams to resolve code defects and system gaps and promote knowledge sharing via runbooks, knowledge articles, and error catalogs.
• Oversee change management & release execution; ensure safe and compliant execution of production changes and releases; validate change readiness, testing, rollback strategies, and risk assessments; represent the team in CAB reviews, providing technical risk evaluation; oversee post-implementation reviews (CPIR) and ensure follow-through and drive
improvements in change success rate and reduction in production defects.
• Advance monitoring, alerting & observability; lead efforts to build and optimize monitoring, dashboards, and alerting frameworks, champion use of tools such as Dynatrace, BigPanda, Logscale, and enterprise platforms, improve signal-to-noise ratio through alert tuning; enable proactive issue detection before customer impact; strengthen event management and observability practices.
• Provide technical leadership in production support; serve as an escalation point for complex production issues; guide troubleshooting across: applications, infrastructure (Linux/Windows), databases (Oracle, SQL), middleware and integrations; ensure efficient log, metric, and system analysis; oversee batch/ETL monitoring and recovery processes; foster strong collaboration across engineering, infrastructure, and vendor teams.
• Champion resiliency, stability & availability; lead efforts to ensure high availability of critical systems; oversee disaster recovery, failover, and continuity testing; identify and eliminate single points of failure and drive improvements in MTTR, uptime, and service reliability.
• Enable scalability & performance optimization; guide capacity planning and performance tuning strategies; ensure systems scale effectively under peak demand; partner with development teams for performance-driven design improvements; optimize system configurations to improve efficiency and throughput.
• Lead a 24x7 production support model; manage team participation in a 24x7 on-call rotation; oversee engagement in incident bridges, war rooms, and escalations; support pod-based operating models aligned to key applications; ensure seamless handoffs and global support continuity.
• Drive Automation & Operational Efficiency; identify and prioritize opportunities to reduce manual effort through automation; implement automation across: Incident remediation, monitoring and alerting, deployment and validation, promote standardized runbooks and automation frameworks and improve operational metrics and reduce toil.
• Ensure Governance, Risk & Compliance; maintain adherence to enterprise policies and regulatory standards; support audits, vulnerability remediation, and risk controls; ensure accurate documentation and operational procedures and champion security, access management, and data governance practices
Qualifications:
• 5 + years of related experience and 3+ years of management experience.
• Strong experience in Site Reliability Engineering, Production Support, or DevOps.
• Proven ability to lead teams in high-availability, enterprise environments
• Deep understanding of incident, problem, and change management frameworks
• Hands-on knowledge of monitoring tools, cloud/infrastructure platforms, and automation
• Experience improving system reliability, observability, and operational maturity
• Strong communication skills with the ability to lead during high-pressure situations.
• Experience with OCP under infrastructure (Linux/Windows, OCP),
MongoDB, Cassandra under databases (Oracle, SQL, MongoDB, Cassandra) and working knowledge of Elasticsearch, Redis, MQ and Kafka is a plus.
PNC is an in-office company that fosters a supportive culture where employees can thrive and achieve balance. We encourage candidates to connect with their recruiter and hiring manager to understand workplace expectations and ensure the role aligns with their goals.
PNC will not provide sponsorship for employment visas or participate in STEM OPT for this position.
Job Description- Manages development projects, development teams and application support functions.
- Oversees multiple application programming and analysis projects which include development, installation and maintenance of application programs.
- Monitors and maintains adherence and compliance to quality standards on an ongoing basis.
- Maximizes staff contribution through professional growth and development, to increase teamwork and more effectively meet business needs.
- Analyzes applications to ensure that all systems that are developed, meet business needs and specifications.
PNC Employees take pride in our reputation and to continue building upon that we expect our employees to be:
- Customer Focused - Knowledgeable of the values and practices that align customer needs and satisfaction as primary considerations in all business decisions and able to leverage that information in creating customized customer solutions.
- Managing Risk - Assessing and effectively managing all of the risks associated with their business objectives and activities to ensure they adhere to and support PNC's Enterprise Risk Management Framework.
PNC also has fundamental expectations of our people managers. As a manager of talent in PNC, you will be expected to:
- Include Intentionally - Cultivates diverse teams and inclusive workplaces to expand thinking.
- Live the Values - Role models our values with transparency and courage.
- Enable Change - Takes action to drive change and innovation that will transform our business.
- Achieve Results - Takes personal ownership to deliver results. Empowers and trusts others in decision making.
- Develop the Best - Raises the bar with every talent decision and guides the achievement of all employees and customers.
QualificationsSuccessful candidates must demonstrate appropriate knowledge, skills, and abilities for a role. Listed below are skills, competencies, work experience, education, and required certifications/licensures needed to be successful in this position.
Preferred SkillsApplication Development, Business Management, Customer Solutions, Design, Group Problem Solving, People Management, Process Improvements, Release Management, ShiftPlanning, Site Reliability Engineering, Software Solutions, User Experience (UX) Design
CompetenciesAgile Development, Application Delivery Process, Application Development Tools, Coaching Others, Design Thinking, IT Environment, Software Process Improvement (SPI), System Testing
Work ExperienceRoles at this level typically require a university / college degree, with 5+ years of industry-relevant experience. At least 3 years of prior management experience is typically required. In lieu of a degree, a comparable combination of education, job specific certification(s), and experience (including military service) may be considered.
Education