The Site Reliability Engineering Manager is a critical role on our Internal IT team. In this role, the candidate will manage technical staff, plan and design the structure of technology solutions, develop processes and ensure their accurate implementation, oversee incident management, and facilitate communication and teamwork between departments.
Essential Duties and Responsibilities:
- Manage technical staff
- Training and Mentoring
- Manage Root cause analysis on technical escalations.
- Help manage technical escalations.
- Handle escalations and lead the most complex issues.
- Work with 3rd party vendors when needed to resolve the technical issues.
- Lead the team in handling a new project or technology.
- Technical Management
- Plan and design the structure of a technology solution.
- Evaluate and select appropriate solution and suggest integration methods.
- Identify areas requiring training and ensure appropriate training is scheduled and delivered.
- Coach individual employees to improve skill sets and job performance.
- Create and monitor development plans to ensure consistent and continual improvement in the skills and abilities of the team.
- Identify product-related issues and work with the Engineering team to resolve it.
- Process Development and Implementation
- Work closely with End User Computing to develop solutions for self service management and Automation.
- Work closely with various Department leads to develop and implement new technical solutions.
- Develop, document, and implement the necessary technical processes to ensure consistently high performance in all areas.
- Develop and maintain a knowledge base to assist in resolving issues.
- Provide suggestions to the Engineering team \ Research team on alerts, conditions, and scripts to reduce the resolution time.
- Work with the Problem Management team to ensure that problem tickets are resolved promptly.
- Incident Management
- Ensure appropriate resources are engaged in incident resolution.
- Oversee all aspects of the incident resolution.
- Ensure appropriate tickets are opened.
- Ensure timely and appropriate internal and external communication occur throughout the incident.
- Serve as an escalation point for third party issues.
- Lead reviews of incidents to identify deficiencies and foster continual improvement in technical skills.
- Ability to develop relationships and work closely with other departments to:
- Ensure superior communication between teams.
- Maximize the efficiency of the process.
- Contribute to and embody the Company culture adopting the core values and encouraging/requiring the same from the team.
Knowledge, Skills, and/or Abilities Required: To perform this job successfully, an individual must be able to perform each essential duty satisfactorily. The requirements listed below are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
- Expertise on multiple technologies with SME in 2-3 technologies.
- General knowledge of IT networks, ISP Services, Routing Protocols, VoIP, Network Management, Routers, and Switches
- Experience with Software Defined Infrastructure and Provisioning Automation
- Cloud-scale Web Infrastructure Engineering Experience
- Excellent communication skills
- Strong sense of ownership and accountability
- Results and detail-oriented
- Experience in grooming the team technically and ramping up the team in new technology.
- Excellent collaboration skills. Should be able to work with multiple teams across domains to resolve a complex issue.
- Ability to resolve conflict
- Passion for providing exceptional service to customers
- Strong work ethic and capable of handling stress
- Excellent process and people management skills
- Good leadership skills
Educational/Vocational/Previous Experience Recommendations:
- Bachelor's degree preferred
- 8-10 years of experience in handling complex technical issues.
- Professional work environment