$80K - $100K(Ladders Estimates)
The role Systems Reliability Engineer is to build solutions to enhance availability, performance and stability of OpenText services as well as automating away repetitive work. You'll also respond to pings, pages and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation. Your mission will be to use cutting edge technology for monitoring and maintaining the day to day operations of the entire production infrastructure for OpenText Discovery on our AWS platform. The best person for this role is someone that has a collaborative spirit - in our world, it's not about being a hero and having all the answers, it's about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption. The team needs someone who can ask questions, learn from others and turn chaos into order.
This role would be a great fit for someone with creative and innovative problem-solving skills. You will develop and implement solutions that operate at scale. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers.
You are great at:
• Provide attention to incidents according to Service Level Agreements.
• Take ownership and accountability for the incident resolution process.
• Provide a quality and timely response.
• Act as a technical liaison with other teams to evaluate and report bugs.
• Establish and maintain a good relationship with team members, Product Development, Customer Service and Sales.
• Participate in training and information sharing activities.
• Act as backup for other team members when necessary.
• May requires rotating shift work.
• On-call rotation is required, as 7x24x365 support is required.
What it takes:
• The ability to understand and maintain Scripting software
• Deep understanding of Windows systems
• Good working knowledge of Linux
• Hands-on experience with cloud infrastructure; AWS a plus - Min 4 years
• Experience with AWS Web Services EC2, RDS, VPC, IAM, Route53, S3, and other AWS Services
• Strong working knowledge of Cloud operational best practices.
• Experience with installing and configuring Apache and Tomcat.
• Deep expertise in Monitoring distributed systems application architectures
• Exposure to & maintenance of configuration management tools at scale
• Strong understanding of ITIL principles, certification is a plus.
• Diagnosing & troubleshooting user-facing service incidents & outages
• Exposure to system & application-level telemetry for large distributed cloud architectures
• Diagnosing, resolving problems in high-throughput web applications & network services
• Expert level troubleshooting skills across different levels of the solution stack
• Customer-service oriented.
• Proven problem solving and analytical ability.
• Excellent organizational/time management skills.
• Ability to handle multiple tasks concurrently.
• Ability to lead, drive and implement highly scalable and complex solutions
• A strong understanding of Security best practices.
• A proven record of being able to work independently and collaboratively.
Other Desired Qualifications
• Experience with container management and micro-services architectures such as Docker
• Application clustering/load balancing concepts and technologies
• Understanding network topologies and common network protocols and services (DNS, HTTP(S), SSH, FTP, SMTP, DHCP, TCP, IP etc…)
• Experience monitoring cloud services with Dynatrace, New Relic, Icinga, Nagios, BMC or any HPE tools
• Experience migrating existing on-premise applications and services to AWS
• Awareness and insight into industry trends (technology, methods and tooling)
Valid Through: 2019-9-13