Use opensource tools to build a scalable logging, monitoring and alerting solution for our shared multi-tenant, private and public cloud platforms.
Build tools to help Operations teams to quickly pinpoint, isolate and resolve issues related to infrastructure, plaform services and applications
Continually maintain and improve software build methodology, procedures, and environment
Design, develop, and maintain product packaging, installation, upgrades, management and administration scripts and utilities
Manage and maintain configuration management infrastructure and source code, rpm and docker image repositories
Deploy and run integrated validation and security tests and code analysis tools as part of the DevSecOPS tool chain.
Deploy, manage, upgrade systems, services and containers using automated configuration management and service orchestration tools
Monitor and alert based on system metrics, analysis of logfiles and custom alert rules
Ensure uptime SLA for the SaaSinfrastructure, services and applications as part of the global DevOPS team
Produce weekly, monthly and quarterly uptime and status reports for production and critical internal infrastructure
3+ Years of proven Development, Operations and/or DevOPS experience deploying and maintaining global multi-tiered infrastructure and web applications
BS or MS in Computer Science, Engineering, or a related technical discipline or equivalent experience
Experience with installation, configuration and management of log collection and aggregation tools like Splunk and ELK, creating dashboards and configuring alerts with tools like pagerduty, JIRA, hipchat, slack.
Experience with deployment of platform services like Kafka, ElasticSearch, and databases like Oracle and Cassandra
Hands-on scripting and coding with python, shell, perl, ruby, php
Good Linux system administrator skills and TCP/IPnetwork fundamentals
Strong analytical and problem-solving skills along with good communication and documentation skills
Experience with CI/CD tools like git/github/stash, svn, Jenkins, Bamboo, Nexxus, Maven, Ant, Artifactory
Strong command of configuration management tools like Puppet, Chef, Ansible, CloudFormation in a large scale environment.
Experience using system and application monitoring using tools like Nagios, AppDynamics, New Relic, SolarWinds
Experience with Metrics Collection and Charting using tools like collectd, statsd, graphite, grafana
Experience with Docker, Microservices and container deployment and service orchestration using Docker Swarm, Kubernetes or Mesos/Marathon.