SRE with Production support

Info Way Solutions

$90K — $130K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in Site Reliability Engineering or Production support.
  • Strong SRE mindset focusing on proactive issue identification using observability tools.
  • Expertise in monitoring tools like Splunk, AppDynamics, and Grafana.
  • Knowledge of virtualization, networking, and database technologies.
  • Experience with containerization technologies such as Docker and Kubernetes.
  • Proficient in ServiceNow and modern automated incident response tools.
  • Excellent communication skills, capable of engaging with senior leadership.

Responsibilities

  • Conduct proactive issue identification to minimize Mean Time to Detection (MTTD) and Mean Time to Recovery (MTTR).
  • Lead incident triage sessions and coordinate cross-functional activities per established SLAs and OLAs.
  • Utilize observability tools to correlate data from various dashboards for effective problem resolution.
  • Serve as the incident commander, driving critical escalated issues through resolution phases.
  • Ensure readiness and flexibility for a 24/7 operational environment.

Benefits

  • Opportunity to work with cutting-edge observability and monitoring tools.
  • Collaborative team environment with exposure to cross-functional teams.
  • Professional development opportunities and growth within the organization.
Full Job Description
Role- SRE with Production support

Location- Bellevue' WA

Skills

SRE Mindset in Production support : Proactive issue identification using observability tools. Skills in using different monitoring & observability tools to track system performance

Incident commander: Ability to diagnose complex issues and actively drive incident calls working with technical, product SMEs, and Tier 2 SREs.

Communication : Excellent communicator who could interact with Director/Sr. Director and above.

Technical expertise

Splunk (including Splunk APM and Splunk O11y), AppDynamics, Grafana, RedMetrics, 1000Eyes

Knowledge of VMs, Load balancers, Firewalls, API Gateways, DB, Network, Linux / Unix

Knowledge of Containerization, Docker, Kubernetes, AWS, PCF, GCP

ServiceNow (including AIOps, tools for Self-Heal and automated playbooks)

APM, NMON , Wireshark usage and analysis

Experience in UEM and synthetic monitoring tools

Responsibilities

Production support activities including proactive identification of issues leveraging observability tools with the aim of reducing MTTD and MTTR

Coordinate all activities required to lead incident triage in compliance with SLAs and OLAs. Corelating inputs from various dashboards & tools to drive resolution.

Flexibility to work in 24 X 7 environment

Similar Jobs

More Information Technology Jobs

Find similar SRE with Production support jobs: