Job DescriptionThe impact you will have in this role:
As a Senior Application Support Engineer (SRE), you will play a critical role in ensuring the stability, reliability, and performance of mission-critical applications at DTCC.
This role goes beyond traditional support-focusing on Site Reliability Engineering principles, proactive system improvement, and operational excellence. You will partner closely with development, infrastructure, and global operations teams to enhance system resilience, reduce operational toil, and drive continuous improvement across the platform.
Your Primary Responsibilities:
- Act as a Lead Application Support Engineer with SRE responsibilities, partnering with engineering and infrastructure teams to improve system reliability, resilience, and observability
- Lead the resolution of critical production incidents, providing clear impact analysis, root cause identification, and preventive actions
- Own and drive incident, problem, and major incident management, including post-incident reviews and continuous improvement
- Proactively identify reliability risks and implement solutions to prevent recurrence and reduce operational toil
- Develop, maintain, and enhance runbooks, knowledge articles, and operational documentation
- Execute and support release, change, and deployment activities, including production releases and vendor upgrades
- Support and participate in Disaster Recovery (DR) testing, execution, and audit readiness
- Drive automation and alert optimization initiatives to improve efficiency and reduce noise
- Embed risk, control, and reliability best practices into day-to-day operations
- Collaborate with global teams to ensure high availability and operational excellence across systems
**NOTE: The Primary Responsibilities of this role are not limited to the details above. **
Qualifications: - 6+ years of experience in application support, SRE, or production engineering
- Bachelor's degree preferred or equivalent experience
Required Skills - Strong understanding of SRE principles, including reliability engineering, observability, and incident prevention
- Experience working in Linux and Windows environments, with strong troubleshooting and log analysis skills
- Hands-on experience with monitoring and observability tools (e.g., Splunk, Grafana)
- Working knowledge of SQL for analysis and troubleshooting
- Experience with ITSM tools (e.g., ServiceNow) for incident, problem, and change management
- Familiarity with job scheduling and modern platforms (e.g., Autosys, OpenShift, containers)
- Exposure to mainframe technologies, including job processing, scheduling, and legacy system interactions
- Understanding of AI/ML concepts in production support (e.g., automation, AIOps, anomaly detection, incident reduction)
- Understanding of security fundamentals (certificates, access, credentials)
- Experience supporting AWS-based applications and services
- Strong communication, ownership, and problem-solving skills in high-pressure environments
- Experience working with global, distributed teams
The salary range is indicative for roles at the same level within DTCC across all US locations. Actual salary is determined based on the role, location, individual experience, skills, and other considerations.