Identify tools needed to monitor custom solutions put in place by Solution Architects and Implementation Engineers.
Successfully implement use of monitoring tools and customize to meet monitoring and alerting business needs.
Analyze logs and error messages to determine source of issue. Debug code and troubleshoot to try and resolve issues. Deliver troubleshooting steps and debugging info to engineering teams for resolution.
Create documentation and processes to allow Engineers to restore service by performing triage, recovery and validation steps for application, network, system and databaseevents.
Establish procedures for alarm handling and escalation.
Create reporting and analysis for health of monitored solutions and identify areas of risk and customer exposure.
Establish framework for NOC duties and responsibilities. Identify coverage and response needs for monitored assets.
Identify hand-offs between departments, SLAs, and responsibilities between groups.
Responsible for the tracking of incidents and requests from initial identification through to resolution, ensuring that appropriate categories for logging and escalating incidents and requests are used.
Establishing feedback loop between technical solutions and stakeholders for ongoing improvements.
Work closely with Solution Consultants and Implementation Engineers to scope monitoring and alerting needs for customer solutions in advance of implementation.
Some knowledge (and experience) of programming languages, enough to follow code execution, detect code errors, debug, and make minor fixes.
Experience working with Microsoft Azure environments
Excellent (advanced) knowledge of SQL
Troubleshooting skills (patience and determination to find and solve problems)
Strong priority management skills and proven problem-solving skills.