Senior Site Reliability Engineer

DataStax   •  

Virtual / Travel

5 - 7 years

Posted 236 days ago

This job is no longer available.

  • Job Description Summary:

    DataStax powers the Right-Now Enterprise with the always-on, distributed cloud database built on Apache Cassandra™ and designed for hybrid cloud. We are seeking a talented Senior Site Reliability Engineer tojoin our team.

    Job Description:

    We are looking for a talented hybrid engineer with a blend of core, distributed systems operations experience and systems-level Java expertise to join our managed cloud team. This team works with some of the largest, most complex distributed problems in the world. You will be working in a very high profile role providing a white glove experience to our managed cloud customers. You will provide operational expertise, workarounds, root-cause analysis, and patches on core database technologies in our flagship product inspired by Apache Cassandra. Past experience with running large-scale distributed systems is required.

    We are extremely selective, but the chosen few are those who are energized by the exciting challenges associated with introducing a new, disruptive technology to customers seeking a managed cloud solution for their database technology needs. The ideal candidate is proactive, self-motivated, autonomous, and takes pride of ownership in their work product. If you are highly energetic, entrepreneurial, technical, and driven to constantly learn new products and technologies, this is the opportunity for you.

    Essential Job Functions:

    • Interface directly with customers, serving as the primary operational point of contact for your managed cloud customers

    • Manage and troubleshoot live distributed systems, in both non-production and production environments

    • Deep-dive into a complex, distributed code-base to understand and document defects and UX shortcomings

    • Provide in-depth feedback, suggestions, and potentially work on patches with the Core Engineering team for defects and improvements that come up during cluster management and troubleshooting

    • Analyze, research, and develop new automations for self-healing operations

    • Serve as advocate for users in discussion with engineering about new features and product direction

    • Some travel (on-site customer work) is required

    Job Requirements:

    • Expert troubleshooting skills with large software deployments and distributed systems

    • 6+ years experience working in a support related, customer-facing role, handling critical, time sensitive issues, directly engaging with customer technical resources

    • 3+ years operational experience on Apache Cassandra or DataStax Enterprise in a devops or support role

    • Deep understanding of the software development life cycle and zero downtime release management

    • Experience with performance profiling and optimization, preferably in a distributed environment

    • Strong linux environment / os performance and troubleshooting skills

    • Strong understanding of Java, Ruby, and/or another programming language

    • Able to debug and identify network issues

    • Strong automation skills using tools such as Ansible, Chef, Terraform, Jenkins, etc.

    • Working knowledge of ELK and Graphite/Grafana

    • Comfortable reading, reviewing, and modifying others' code

    • Self-motivated with ability to multi-task and work under minimal supervision

    • Excellent written and verbal communication skills

    • BS, MS or PhD degree in Computer Science or related major