The Site Reliability Engineer (SRE) creates a bridge between development and operations by applying an engineering mindset to system administration topics, analyzing complex data systems, anticipating problems and findings ways to mitigate risk. The SRE works with software development in the design implementation, and ongoing operation of the tools that automate building, provisioning, and monitoring of the Choice Hotels infrastructure and applications. Our SRE group is responsible for the reliability, automation, and scalability while being ever vigilant about capacity and performance.
This position will help drive platform reliability best practices, influencing design decisions, and getting hands-on to tackle key infrastructure, cluster, automation, and scaling challenges. In addition to technical depth, great interpersonal skills and comfort in a fast-paced setting, this role calls for working with diverse stakeholders in an agile work environment.
Reports to Manager of Network Operation
- Address critical and major incidents as they arise and drive the incident through to resolution while keeping accurate and comprehensive notes to be utilized in the “Root Cause Analysis and Problem Management” phase of the incident.
- Ability to utilize the suit of monitoring and APM applications deployed in the enterprise to recognize trends within applications in order to quickly resolve potential incidents and outages guarantying the high availability of systems.
- Automate the build operation and monitoring of Choice Hotels International eco-system.
- Be responsible for and drive the continuous improvement process within the Operations group and its members.
- Ensure all runbooks, tools, documentation and systems are up to date ensuring a highly available presence of both team and equipment.
- Ability to provide updates, control incident, and effectively take meeting notes during all incidents severity’s
- Perform any additional responsibilities that guaranty high up time of systems and applications as well as Choice Associates.
- Ability to recognize and utilize SQL server commands and tables.
- Ability to utilize AWS tools such as CloudFront, CloudWatch and other tools that are currently integrated such as AppDynamics and Prometheus/Grafana
- Implement, test, deploy and support continuous integration with various teams that build and deploy to development, testing, and production environments.
- Automate the jobs, operation, and monitoring of the Choice Hotels eco-system
- Support Choice hotels cloud provisioning and migration efforts into AWS. Be responsible for and drive the continuous improvement process within the Operations Center and its members.
- Perform and automate windows, Linux, Unix, and database administration.
- Investigate, analyze and make recommendations regarding technology improvements, development best practices, standardization, scaling, upgrades, and modifications to our products and the services utilized to deploy those products.
- Partner closely with the software, hardware, systems, and quality groups to ensure components of the products meet or exceed applicable specifications and reliability objectives.
- Other teams within the Enterprise organization to monitor and deploy hardware and equipment orders.
- Vendors to ensure proper equipment installation and decommission of servers.
- Vendors to help implement standard hardware configurations.
- Management to evaluate new hardware technologies, trends, tools and process improvement procedures.
- Partner with Manager or Director of Operations to seek out professional development opportunities which improve performance and knowledge of operations.
- Other teams within the organization to deploy changes, monitor releases, hotfixes, and maintenance
- Provide an On-call emergency request for Service Desk after hours. According to Choice Hotels On-call policies
SKILLS, EDUCATIONAL BACKGROUND AND EXPERIENCE:
- 2-4-year degree in Computer Science or technology related field, 4+ years of related experience or equivalent combination of education and experience.
- Familiarity with distributed systems and/or data pipeline design pattern
- 2-4 years require experience with one or more programming or scripting languages such as bash, .Net Core, PowerShell, C#, Java, Python, C++, and or React JS
- Server and network hardware experience
- Experience working in a SaaS environment and/or with microservices
- Familiar with the Site Reliability Engineering best practices.
- Experience with at least some of the following: Kubernetes, Kafka, Terraform, EKS, Harness, and or high scale datastores or backups.
- Experience working with multiple teams in a well-diversified mix of technologies.
- Critical thinking and application of analytical skills.
- An abiding intellectual curiosity and passion for creating solutions.
- Willingness and ability to work shifts, flexible hours, including nights and weekends
- Direct experience with Inventory Management, Incident Management, and Change Management tools/processes. ITIL Foundations certificate preferred.
- Demonstrated strong working analytical and troubleshooting skills.
- Excellent verbal and written communication skills to include Word, Excel, PowerPoint, and SharePoint.
- Strong knowledge of Incident Management Tools, Service Tracking Management Tools, Trending and Reporting, and Change Management Tools.
- Preferred 4+ years of application, hardware, and server support experience.
- Required 4+ years of customer service experience.
- Required 2- 4yrs of Incident Response and incident management
- Required 4+ plus years of SQL server
- Required 2+ years of AWS cloud migration or implementation.
- Must provide overtime coverage when necessary to support the overall function of the department.
- Professionalism and discretion
- Ability to work in a high-pressure environment
- Able to prioritize work to include that of other Team Members
- Ability to work collaboratively with other teammates as well as the organization leaders
- Must be able to uphold Choice’s Value and Performance Principles of Being Bold, Being Quick, Listening, Being Curious, and showing integrity, customer focus, and respect.