What success looks like in this role: • Responds to and resolves higher level / more complex client and user operational issues.
• Works with Enterprise Applications team members to define end-user requirements, functionality specifications and deliverables.
• Designs new and modifies existing solution strategies to ensure achievement of SLA requirements.
• Maintains all technical deliverables including technical design specifications and configuration changes.
• Performs application configuration, extension, integration and performance tuning.
• Handles issues escalated from less experienced team members, clients and other stakeholders.
• Helps design, plan and implement enhancements to standard operating procedures.
Observability Responsibilities
- Design, implement, and manage monitoring and observability solutions for servers, applications, databases, network devices, and cloud environments.
- Configure and maintain observability platforms for metrics, logs, traces, dashboards, and alerts.
- Develop dashboards and reports to provide visibility into system health, service performance, availability, and operational KPIs.
- Set up and fine-tune alerting mechanisms, thresholds, escalation rules, and notification workflows to ensure timely incident detection and response.
- Monitor infrastructure and application performance to ensure adherence to SLA, SLO, and availability targets.
- Perform analysis of logs, alerts, and telemetry data to identify trends, anomalies, and potential service risks.
- Support incident management, problem management, and root cause analysis by providing actionable monitoring insights.
- Integrate monitoring tools with ITSM, event management, automation, and ticketing platforms.
- Reduce alert noise through alert optimization, event correlation, and dependency mapping.
- Support cloud and platform migrations by ensuring observability coverage during transformation initiatives.
- Collaborate with DevOps, SRE, and engineering teams to embed observability into system architecture and deployment pipelines.
- Perform capacity planning, utilization analysis, and trend forecasting for infrastructure and application environments.
- Automate monitoring configuration, maintenance, and deployment using scripting or Infrastructure as Code practices.
- Maintain operational documentation, runbooks, dashboard standards, and monitoring governance processes.
- Ensure compliance with logging, monitoring, audit, and operational security requirements.
- Contribute to continuous improvement initiatives including AIOps, self-healing, automation, and proactive event management.
You will be successful in this role if you have:- BA/BS degree and 4-6 years of relevant experience, or equivalent.
- Experience supporting and configuring enterprise applications in complex environments.
- Proficiency with observability/monitoring platforms (e.g., Datadog, Dynatrace, New Relic, Splunk, Elastic, Prometheus, Grafana).
- Hands-on experience with metrics, logs, traces, dashboards, and alerting systems.
- Strong scripting/automation skills (Python, PowerShell, Bash, or similar).
- Experience with Infrastructure as Code tools (Terraform, Ansible, CloudFormation).
- Working knowledge of AWS, Azure, or GCP monitoring and logging services.
- Experience integrating monitoring tools with ITSM and event management platforms.
- Ability to perform root cause analysis, trend analysis, and capacity planning using telemetry data.
- Experience embedding observability into CI/CD pipelines and deployment workflows.
- Ability to design and optimize alerting rules, thresholds, and correlation logic.
- Understanding of logging/monitoring compliance, audit, and security requirements.
- Experience with automation, AIOps, and self-healing or event-driven operations.
#LI-JV1
This role may require access to export-controlled commodities and technology. Therefore, to conform to U.S. export control regulations, applicant should be eligible for any required authorizations from the U.S. Government.