DTCC

Lead Observability Engineer (Grafana Cloud)

DTCC$100K — $130K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Minimum of 6 years of relevant experience in IT infrastructure and application monitoring.
  • Bachelor's degree in Computer Science or related field, or equivalent experience.
  • 5+ years of experience with Splunk engineering/support in a production environment.
  • Proficiency with monitoring tools like Grafana Cloud, Dynatrace, and OpenText Operations Bridge Manager.
  • Experience with cloud platforms (AWS, Azure, Google Cloud) and their monitoring services.
  • Knowledge of scripting languages such as Python, Bash, or PowerShell for automation tasks.
  • Strong analytical skills with attention to detail.

Responsibilities

  • Develop and implement monitoring solutions for applications and infrastructure.
  • Continuously monitor system performance and analyze trends to address issues.
  • Manage incidents and perform root cause analysis to prevent recurrence.
  • Collaborate with IT teams to align monitoring solutions with business needs.
  • Select, configure, and maintain monitoring tools and platforms.
  • Generate and present reports on system performance and incidents for stakeholders.
  • Improve monitoring processes and tools for better efficiency and effectiveness.

Benefits

  • Participate in user training to enhance awareness of observability solutions.
  • Receive on-call L3 support responsibilities on a rotational schedule.
  • Engage in administrative functions to maintain and document monitoring tools.
  • Work on engineering and development focused projects with minimal supervision.
  • Opportunity to develop risk management functions related to security and compliance.
Full Job Description
Job Description

The Impact you will have in this role:

Being a member of IT FinSight Delivery team, you will be for an Observability Engineer that will be performing the role for the Observability Engineering team. The team maintains the firm's application and infrastructure monitoring tools, reporting and analytics tools. This position is primarily for working on monitoring tools like Grafana Cloud (SaaS), Dynatrace, OpenText Operations Bridge Manager and Splunk Cloud.

Your Primary Responsibilities:
  • Design and Implementation: Developing and implementing monitoring solutions for applications and infrastructure to ensure high availability and performance.
  • Monitoring and Analysis: Continuously monitoring system performance, identifying bottlenecks, and analyzing trends to proactively address potential issues.
  • Incident Management: Responding to and managing incidents, performing root cause analysis, and implementing corrective actions to prevent recurrence.
  • Collaboration: Working closely with development, operations, and other IT teams to ensure monitoring solutions are integrated and aligned with business needs.
  • Tool Management: Selecting, configuring, and maintaining monitoring tools and platforms, ensuring they are up-to-date and effective.
  • Reporting: Generating and presenting reports on system performance, incidents, and trends for stakeholders.
  • Optimization: Continuously improving monitoring processes and tools to enhance efficiency and effectiveness.
  • Compliance and Security: Ensuring monitoring solutions comply with security policies and regulatory requirements.
  • Working on engineering and development focused projects from start to finish with minimal supervision
  • Providing technical and operational support for our customer base as well as other technical areas within the company that utilize our tools
  • Risk management functions such as reconciliation of vulnerabilities, security baselines as well as other risk and audit related objectives
  • Administrative functions for our tools such as keeping the tool documentation current and handling service requests
  • 24x7 on-call L3 support on a rotational schedule with other team members
  • Participate in user training to increase awareness of observability solutions
  • Ensuring incident, problem and change tickets are addressed in a timely fashion, as well as escalating technical and managerial issues
  • Following DTCC's ITIL process for incident, change and problem resolution

**NOTE: The Responsibilities of this role are not limited to the details above. **

Qualifications:
  • Min of 6 years of relevant experience
  • Bachelors' degree in Computer Science or any technical field and/or equivalent experience

Talents Needed for Success:
  • Minimum of 6 years of experience in IT infrastructure, application monitoring, and performance management.
  • 5+ years' experience of Splunk engineering/support in a production environment. This includes all phases of lifecycle management: planning, design, deployment, upkeep and retirement
  • Should have a developed competency with monitoring solutions in a production environment
  • Monitoring Tools: Proficiency in using monitoring tools such as Grafana Cloud, Dynatrace, OpenText Operations Bridge Manager, Splunk, and others.
  • Scripting and Automation: Knowledge of scripting languages like Python, Bash, or PowerShell to automate monitoring tasks and processes.
  • Cloud Platforms: Experience with cloud platforms such as AWS, Azure, or Google Cloud, including their monitoring and management services.
  • Networking: Understanding of network protocols, configurations, and troubleshooting.
  • System Administration: Strong background in system administration for both Windows and Linux environments.
  • Database Management: Familiarity with database monitoring and performance tuning.
  • Problem-Solving: Strong analytical and problem-solving skills to identify and resolve issues quickly.
  • Communication: Excellent communication skills to collaborate with different teams and present findings to stakeholders.
  • Attention to Detail: Keen attention to detail to ensure accurate monitoring and reporting.
  • Adaptability: Ability to adapt to new technologies and methodologies in a fast-paced environment.
  • Self-starter, continually striving to improve the teams service offerings and one's own skillset


The salary range is indicative for roles at the same level within DTCC across all US locations. Actual salary is determined based on the role, location, individual experience, skills, and other considerations.

About the Team

The IT SIFMU Delivery Department supports core Clearing and Settlement application delivery for DTC, NSCC and FICC. The department also develops and supports Asset Services, Wealth Management & Insurance Services and Master Reference Data applications.

About DTCC

The Depository Trust & Clearing Corporation (DTCC) is a financial services company that provides clearing, settlement, and information services for the global financial industry. DTCC was founded in 1999 and is headquartered in New York City. The company operates through subsidiaries that provide services such as trade matching, risk management, and asset servicing. DTCC is owned by its users, which include broker-dealers, banks, and other financial institutions. The company is committed to reducing risk and increasing efficiency in the financial markets.
Learn more about DTCC
Size
4,000 employees
Industry
Founded
1973

Similar Jobs

More Jobs at DTCC

More Information Technology Jobs

Find similar Lead Observability Engineer (Grafana Cloud) jobs: