What you'll be doing
- Use/Improve existing tools for effective administration and monitoring of a large-scale web service on AWS cloud.
- Troubleshoot and resolve live production issues by analyzing logs/errors from different sources.
- Design, develop/improve tools to automate the monitoring and resolution of production issues
- Work with development teams to harden, enhance, document, establish process, and generally improve the operability and supportability of our systems.
- Assist in the configuration/build-out of new deployments to facilitate our constant growth.
- Work with Security Managers to establish and document security controls and procedures
Required experiences and skills
- Strong system administration background for Linux based systems
- Strong application level troubleshooting and problem-solving skills
- Good communication skills and ability to handle high-pressure production incidents
- Operational expertise around deploying and managing components like MySQL, Nginx, RoR, ElasticSearch, Java Applications, Load Balancers, Graphana, RabibitMQ.
- Comfortable with networking fundamentals like Firewalls, Subnetting, Route tables etc.
- Ability to write simple Bash/Python scripts
- Bachelors degree in CS/Eng required, masters/ Ph.D. a plus
- Knowledge of scripting languages like Python or Ruby
- Experience working with config and deploy management tool like Chef or Puppet