As a Site Reliability Engineer, you will be responsible for managing production services and will work with Engineering and Operations teams to ensure reliability, scalability, and performance of services in the Barracuda Cloud.
- Ensure service reliability and uptime for Barracuda Cloud services
- Troubleshoot issues across the entire stack: hardware, software, application and network
- Collaborate with internal groups to identify, develop, and deploy manageable, scalable and robust services
- Ensure consistent application of operational standards across cloud services
- Represent Cloud Engineering in design reviews and operational readiness exercises for new and existing services
- 5+ years proficiency in Linux/Unix command line and understanding of package management on Linux systems
- Demonstrated programming skills with scriptinglanguages such as Python, PHP, Bash, Ruby, or Java
- Experience with configuration management systems such as Puppet
- Experience implementing service monitoring and alerting using tools such as Nagios
- Understanding of network OSI model
- Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills
- CS/Engineering Degree or equivalent work experience
- Prefer the candidate to work out of Ann Arbor, MI or Fresno, CA. Will consider remote for the right candidate.