Looking for a Site Reliability Engineer to augment the existing Site Reliability Engineering (SRE) team for a large analytic cloud repository. This position is primarily focused on building and implementing monitoring tools in a DevOps type environment to increase the stability of various hardware instances associated with the customer's various analytic hosting platforms. A successful candidate for this position has experience in python and Java development. Additionally, experience working using version control tools, Linux experience, has a familiarity with computer networking, and understands the principles of site reliability engineering would benefit the candidate.
- Shall have Eight (8) years experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution.
- Shall have four (4) years experience in system engineering/architecture.
- Shall have four (4) year experience working with products that support highly distributed, massively parallel computation needs such as Hbase, Hadoop, CloudBase/Acumulo, Big Table, Cassandra, Scality etc.
- At least four (4) years experience writing software scripts using scripting languages such as Perl, Python, or Ruby for software
- At least two (2) years experience managing and monitoring large Cloud System (>200 nodes).
- Cloud Systems Administrator or Developer Certification.
- Experience in performing and providing technical direction for the development, engineering, interfacing, integration, and testing of complete hardware/software systems to include monitoring technical health of a system, improving organizational processes, implementation of postmortem (failure) analysis and incident management.
- Shall have four (4) years experience in the cleared environment.
- Four (4) years demonstrated experience developing software for one of the following: Windows, UNIX, or Linux OS.
- Knowledge and experience with developing distributed storage routing and querying algorithms.
- Experience in developing documentation required to support a program's technical issues and training situations.
- Four (4) year of experience developing software systems using object-oriented programming languages (i.e. Java, Python, etc.).
- Experience developing solutions integrating and extending COTS products.
- Experience "wrapping" legacy systems or components as Web Services within a SOA framework.
- Demonstrated knowledge of analytical needs and requirements, query syntax, data flows, and traffic manipulation.
- Four (4) years experience in developing system performance, availability, scalability, manageability, and security requirements for mid-to-large scale programs.
- Experience designing, developing, testing, evaluating, and integrating information systems into a services oriented environment
- Experience optimizing storage, retrieval, backup, and retention strategies across globally distributed, high throughput, text and multimedia storage within clustered or cloud environments.
- Experience operating in a multi-thread environment.
- Experience debugging & troubleshooting complex software in a cloud environment.
- Familiarity with Configuration Management and monitoring tools.
- Familiarity with Agile software methodologies and practices.
- Significant experience provisioning and sustaining network infrastructures and have experience developing, operations, and managing networks required operating in a secure PKI, IPSEC, or VPN enabled environment.
A Bachelor's Degree in Computer Science or in a related technical field is highly desired which will be considered equivalent to two (2) years experience. A Master's degree in a Technical Field will be considered equivalent to four (4) years experience.