DTCC

Senior Site Reliability Engineer (Application Support)

DTCC$120K — $150K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 6+ years of experience in application support, SRE, or production engineering
  • Bachelor's degree preferred or equivalent experience
  • Strong understanding of SRE principles including reliability engineering and incident prevention
  • Experience in Linux and Windows environments with troubleshooting skills
  • Hands-on experience with monitoring tools such as Splunk and Grafana
  • Working knowledge of SQL for analysis and troubleshooting
  • Familiarity with ITSM tools like ServiceNow for management processes

Responsibilities

  • Act as Lead Application Support Engineer focusing on reliability and observability
  • Lead resolution of critical production incidents including impact analysis
  • Own and manage incident, problem, and major incident reviews for continuous improvement
  • Identify reliability risks and implement proactive solutions
  • Develop and enhance operational documentation and runbooks
  • Support release, change, and deployment activities, including vendor upgrades
  • Participate in Disaster Recovery testing and ensure audit readiness

Benefits

  • Opportunity to work at a leading financial services organization
  • Collaborative environment with global teams
  • Focus on continuous improvement and innovation
  • Exposure to modern monitoring and automation technologies
  • Professional development and training opportunities
Full Job Description
Job Description

The impact you will have in this role:

As a Senior Application Support Engineer (SRE), you will play a critical role in ensuring the stability, reliability, and performance of mission-critical applications at DTCC.

This role goes beyond traditional support-focusing on Site Reliability Engineering principles, proactive system improvement, and operational excellence. You will partner closely with development, infrastructure, and global operations teams to enhance system resilience, reduce operational toil, and drive continuous improvement across the platform.

Your Primary Responsibilities:
  • Act as a Lead Application Support Engineer with SRE responsibilities, partnering with engineering and infrastructure teams to improve system reliability, resilience, and observability
  • Lead the resolution of critical production incidents, providing clear impact analysis, root cause identification, and preventive actions
  • Own and drive incident, problem, and major incident management, including post-incident reviews and continuous improvement
  • Proactively identify reliability risks and implement solutions to prevent recurrence and reduce operational toil
  • Develop, maintain, and enhance runbooks, knowledge articles, and operational documentation
  • Execute and support release, change, and deployment activities, including production releases and vendor upgrades
  • Support and participate in Disaster Recovery (DR) testing, execution, and audit readiness
  • Drive automation and alert optimization initiatives to improve efficiency and reduce noise
  • Embed risk, control, and reliability best practices into day-to-day operations
  • Collaborate with global teams to ensure high availability and operational excellence across systems

**NOTE: The Primary Responsibilities of this role are not limited to the details above. **

Qualifications:
  • 6+ years of experience in application support, SRE, or production engineering
  • Bachelor's degree preferred or equivalent experience

Required Skills
  • Strong understanding of SRE principles, including reliability engineering, observability, and incident prevention
  • Experience working in Linux and Windows environments, with strong troubleshooting and log analysis skills
  • Hands-on experience with monitoring and observability tools (e.g., Splunk, Grafana)
  • Working knowledge of SQL for analysis and troubleshooting
  • Experience with ITSM tools (e.g., ServiceNow) for incident, problem, and change management
  • Familiarity with job scheduling and modern platforms (e.g., Autosys, OpenShift, containers)
  • Exposure to mainframe technologies, including job processing, scheduling, and legacy system interactions
  • Understanding of AI/ML concepts in production support (e.g., automation, AIOps, anomaly detection, incident reduction)
  • Understanding of security fundamentals (certificates, access, credentials)
  • Experience supporting AWS-based applications and services
  • Strong communication, ownership, and problem-solving skills in high-pressure environments
  • Experience working with global, distributed teams

The salary range is indicative for roles at the same level within DTCC across all US locations. Actual salary is determined based on the role, location, individual experience, skills, and other considerations.

About DTCC

The Depository Trust & Clearing Corporation (DTCC) is a financial services company that provides clearing, settlement, and information services for the global financial industry. DTCC was founded in 1999 and is headquartered in New York City. The company operates through subsidiaries that provide services such as trade matching, risk management, and asset servicing. DTCC is owned by its users, which include broker-dealers, banks, and other financial institutions. The company is committed to reducing risk and increasing efficiency in the financial markets.
Learn more about DTCC
Size
4,000 employees
Industry
Founded
1973

Similar Jobs

More Jobs at DTCC

More Information Technology Jobs

Find similar Senior Site Reliability Engineer (Application Support) jobs: