The Climate Corporation is looking for Cloud Ops Engineers to improve and enhance the security, reliability and availability of microservices running in Amazon AWS. In this role you will get the chance to demonstrate how business/application metrics can be an enabler in enhancing infrastructure functionality, scalability, performance and reliability. Our team has been challenging the status quo of our engineering systems to run more efficiently and deliver higher value.
Our team's challenge is to accelerate The Climate Corporation’s engineering organization’s innovation and research. We build systems that efficiently and reliably make our technical community a better place. We are bringing state of the art technologies such as Docker, microservices and serverless computing into production right now. Your challenge - should you decide to accept it - is to collaborate with us to build this better and secure world for our scientists and engineers.
What you will do:
- Ensure the highest level of uptime and Quality of Service (QoS) for our farmers through operational excellence.
- Be an escalation point for all production issues and partner with App teams to drive incident management process.
- Collaborate with App teams to build reliable, secure and faster microservices.
- Design and maintain production monitoring systems.
- Troubleshoot performance and stability issues using a wide variety of tools.
- Recommend and implement exceptional engineering practices towards ensuring performance, reliability, and measurability at massive scale.
- Bachelor's degree
- Proficiency in a Unix/Linux environment
- Experience with Service Oriented Architectures (SOA)
- 5+ years of experience with monitoring, troubleshooting and diagnosing infrastructure systems
- Experience with configuration management tools such as Chef, Ansible or Puppet
- Experience with any public cloud based provider such as Amazon Web Services, Microsoft Azure, or Google Cloud Compute
- Expertise in open source monitoring systems (Prometheus, Sensu), NewRelic and Splunk.
- Experience with Service Oriented Architectures (SOA), Docker Containers and scheduling frameworks (e.g Kubernetes, Amazon ECS)
- Experience with Jenkins or other CI/build tools
- Familiarity with distributed data platforms (e.g. DynamoDB, Hadoop, EMR, Spark, PostGIS, ElasticSearch)
- Experience with languages (Python, Java)
- Maturity, judgement, negotiation/influence, analytical, and leadership skills
- Strong written and verbal communication skills
- BS or higher in Computer Science