OmniTI is looking for Site Reliability Engineer to join our team!
The OmniTI Ops team is a flexible and progressive group. We work closely with developers, DBAs, and client teams, to help them manage availability and performance in the midst of constant changes. We are not risk averse; instead, we strive to understand why things fail and understand the true impact of those failures so that we can empower others. Collaboration is a cornerstone, and we understand that being friendly and outgoing are keys to making that work.
About The Job
The role of SRE is a highly technical role and requires a thorough understanding of all components of a modern web application stack, including front-end, networking, and systems level knowledge. In this role you will be working hands on with clients to design, build and operate reliable and scalable services in the cloud, our custom hosting platform, or in their datacenter. You'll also help support our internal infrastructure and teams, as well as providing systems consulting, open source product development, and data center infrastructure support for our customers.
No one knows it all, but these are the kinds of things we're looking for:
- Experience with cloud and virtualization technologies: AWS, VirtualBox, KVM, zones/containers, Vagrant, Docker
- Excellent troubleshooting skills with the ability to dive deep into all aspects of the stack to identify and fix problems
- Strong background in web server technologies such as Apache, HAProxy, nginx
- Familiarity with technologies such as Apache Traffic Server or Varnish, and a good working knowledge of the issues when implementing web caching
- Strong knowledge of IP networking protocols
- Experience with configuration management tools such as Chef, Puppet, or Ansible
- Familiarity with version control systems such as Git/Subversion, from both an end user and administrator perspective
- Exposure to dynamic tracing such as Dtrace or Brendan Gregg's blog
- In-depth understanding of Unix oriented operating systems including illumos, Linux, Solaris 10+, or *BSD
You must be willing to share in an on-call rotation and work to eliminate sources of operational disruption. You won't just be working on our infrastructure, you'll also be expected to help our clients with broken, under performing infrastructure, turning it into something that "just works". You'll get to work on hard problems and be proud of the solutions you'll build.
If you contribute to an open source project, have a blog, or are involved in technology in some other way, we would love to hear about it when you write to us!