Manager, Reliability Engineering
As a member of this versatile group of full stack engineers, you will be on the front line for maintaining and expanding the capabilities of a variety of systems. The team exists in the space between traditional systems administration and development, and seeks to merge the capabilities from both disciplines.
-Act as a conduit between infrastructure and development teams, being sympathetic to the concerns and priorities of both.
-Primary operational support and engineering for multiple large distributed software applications;
-Primary operational support and engineering for our OpenStack private cloud.
-Improve all aspects of software reliability, including better monitoring, alerting and documentation.
-Engage with the software engineering teams on support issues and improvements to tools, processes, and software.
-Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.
-A bachelor's degree in computer science or another highly technical, scientific discipline.
-In-depth knowledge and experience in at least one of: host based networking, Linux/UNIX administration, systems programming, distributed systems, databases, cloud computing, and a desire to learn more.
-The ability to quickly leverage off the shelf and open source systems and utilities to rapidly provision production systems in a variety of domains, especially for multi-tenant use.
-A proven track record of automation and an algorithmic approach to solving problems.
-A proactive approach to spotting problems, areas for improvement, performance bottlenecks, etc.
-An understanding of the operational concerns in a demanding environment; ideally, but not necessarily, finance.
-The ability to understand the inherent trade-offs between various software architectures as it relates to performance, resiliency/fault tolerance, load balancing, data consistency.
-Ability to profile and debug applications in real time.
Additional Skills Preferred:
-Experience with authentication and encryption technologies like SSL, Kerberos and GSSAPI.
-Networking experience, analyzing packet dumps, multicast routing on hosts, packet filtering.
-OS/kernel experience such as familiarity with OS tunables, log analysis.
-Experience with automated configuration management tools like Ansible, Chef, and Puppet.
-Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn).
*Will Relocate The Right Candidate! Will Sponsor Visa's. Will only consider candidates from top tier computer science universities and/or individuals with a stellar GPA. Bachelor's and/or Master's degree from a top computer science program with a GPA of 3.5 or higher. PhD preferred. Top computer science program preferred (Carnegie Mellon University, Massachusetts Institute of Technology/MIT, Stanford University, University of California-Berkeley, Cornell University, University of Illinois-Urbana-Champaign, Princeton University, University of Washington, University of Texas-Austin, Georgia Institute of Technology, California Institute of Technology, University of Wisconsin-Madison, University of Michigan-Ann Arbor, etc.
If you are an employer and recruiting for similar IT professionals / positions, please contact our Technical Recruiters at Next Step Systems. We are a national IT Recruiting Firm / Agency specializing in full-time direct hire Information Technology employment opportunities.
Company Will Sponsor Visas! Company Will Relocate Candidates!
"PLEASE DO NOT APPLY" If You Are A Consulting Firm, Third Party Recruiter Or Seeking Corp-To-Corp; W-2 Direct Hire Only.