Job Description Summary:DataStax powers the Right-Now Enterprise with the always-on, distributed cloud database built on Apache Cassandra™ and designed for hybrid cloud. We are seeking a talented Senior Site Reliability Engineer tojoin our team.
We are looking for a talented hybrid engineer with a blend of core, distributed systems operations experience and systems-level Java expertise to join our managed cloud team. This team works with some of the largest, most complex distributed problems in the world. You will be working in a very high profile role providing a white glove experience to our managed cloud customers. You will provide operational expertise, workarounds, root-cause analysis, and patches on core database technologies in our flagship product inspired by Apache Cassandra. Past experience with running large-scale distributed systems is required.
We are extremely selective, but the chosen few are those who are energized by the exciting challenges associated with introducing a new, disruptive technology to customers seeking a managed cloud solution for their database technology needs. The ideal candidate is proactive, self-motivated, autonomous, and takes pride of ownership in their work product. If you are highly energetic, entrepreneurial, technical, and driven to constantly learn new products and technologies, this is the opportunity for you.
Essential Job Functions:
Interface directly with customers, serving as the primary operational point of contact for your managed cloud customers
Manage and troubleshoot live distributed systems, in both non-production and production environments
Deep-dive into a complex, distributed code-base to understand and document defects and UX shortcomings
Provide in-depth feedback, suggestions, and potentially work on patches with the Core Engineering team for defects and improvements that come up during cluster management and troubleshooting
Analyze, research, and develop new automations for self-healing operations
Serve as advocate for users in discussion with engineering about new features and product direction
Some travel (on-site customer work) is required
Expert troubleshooting skills with large software deployments and distributed systems
6+ years experience working in a support related, customer-facing role, handling critical, time sensitive issues, directly engaging with customer technical resources
3+ years operational experience on Apache Cassandra or DataStax Enterprise in a devops or support role
Deep understanding of the software development life cycle and zero downtime release management
Experience with performance profiling and optimization, preferably in a distributed environment
Strong linux environment / os performance and troubleshooting skills
Strong understanding of Java, Ruby, and/or another programming language
Able to debug and identify network issues
Strong automation skills using tools such as Ansible, Chef, Terraform, Jenkins, etc.
Working knowledge of ELK and Graphite/Grafana
Comfortable reading, reviewing, and modifying others' code
Self-motivated with ability to multi-task and work under minimal supervision
Excellent written and verbal communication skills
BS, MS or PhD degree in Computer Science or related major