The Monitoring Engineer will be part of the DevOPS group at Qualys and will be responsible for deploying, maintaining and managing an array of open source tools to effectively monitor the Qualys shared and private cloud platforms. The ideal candidate will have hands-on Operations or DevOPS experience, and be proficient in scriptinglanguages such as bash, PERL or Python. This candidate will also need to be very meticulous, thorough, and proactive to ensure every area of every environment is covered. The candidate must be self-sufficient and have good collaborative skills to drive successful implementations of the monitoring solutions with all of the other Operations teams. This person will be part of a distributed team in the US and India.
DUTIES AND RESPONSIBILITIES:
* Implement, configure and manage new monitoring solutions to ensure end-to-end functionality for monitoring and alerting
* Evaluate off-the-shelf and open source solutions
* Collaborate with Operations teams to ensure full monitoring of all existing infrastructure and ensure that monitoring is automatically enabled as part of spinning new infrastructure
* Ensure thorough and complete monitoring of all environments and layers – network, server, storage and applications
* Provide a gap analysis of missing features and implement them across array of monitoring tools
* Collaborate with Engineering where necessary to enhance monitoring of applications and setting alert thresholds.
* Setup monitoring for shared co-location datacenter deployments, on-premise private cloud deployments as well as deployments in public clouds.
* Prioritize tasks, projects, and deployments
*Design Monitoring Dashboard for KPIs and provide weekly, monthly and quarter uptime reports based on synthetic monitoring
* Automate deployment of monitoring agents and servers using configuration management tools like ansible or puppet.
* Serve as a point of escalation for projects and issues
* Be on-call as part of the monitoring team rotation
KNOWLEDGE, SKILLS, AND ABILITIES REQUIRED:
* 1+ years administrating Nagios, Sensu, Zabbix or other comprehensive open source solution
* 3+ years of experience in Production Operations
* 2+ years bash, PERL or Python experience
* 2+ years administrating Linux
*Experience with managing and using APM tools like AppDynamics or NewRelic
*Experience with using Time Series Databases like Graphite and InfluxDB
*Experience with Graphing tools like graphite and grafana, ability to quickly create useful dashboards for the engineering and operations teams.
*Experience with managing and using Splunk and ELK (ElasticSearch, Logstash, Kibana) for log aggregation and operational intelligence
*Experience with monitoring of infrastructure services and applications in public clouds like AWS, Azure and GCP
* Solarwinds administration experience a plus
*Experience working in a SAAS environment a plus
* Must have excellent organizational, communication and presentation skills
* Ability to handle tight deadlines and drive timely completion of tasks
* Ability to work under pressure in 24/7 environment
* Ability to quickly prioritize tasks and projects