Reflexive Concepts is seeking a skilled DevOps Software Engineer III! Specifically, someone who will be responsible for all the Operational and Maintenance (O&M) efforts including installation, configuration, integration, monitoring, and sustaining of a large multi-tenant containerized Kubernetes High Performance Computing as a service (HPCaaS) platform for a large Linux computing environment.
The DevOps Software Engineer must be detailed oriented, have strong organizational skills, and excellent troubleshooting skills to include the identification and resolution of issues, problems, and trouble tickets related to the same.
Qualifications: - Active TS/SCI + FS Polygraph
- Master's degree in Computer Science or related discipline from an accredited college or university, plus five (5) years of experience as a SWE, in programs and contracts of similar scope, type, and complexity OR
- Bachelor's degree in Computer Science or related discipline from an accredited college or university, plus seven (7) years of experience as a SWE, in programs and contracts of similar scope, type, and complexity OR
- Nine (9) years of experience as a SWE, in programs and contracts of similar scope, type, and complexity.
Required Skills + Experience:
- Experience using the Linux CLI
- Experience developing and maintaining scripts using Bash/Python
- Experience developing with Python and Java in a Linux environment.
- General HPC technical knowledge regarding compute, network, memory, and storage system components
- Experience installing, configuring, and supporting COTS/GOTS/FOSS software, libraries, and packages in a Linux environment.
- Experience with containerization technologies such as Docker and containerd
- Experience with container orchestration technologies such as Kubernetes.
- Experience administering Kubernetes clusters on bare metal in a Linux environment.
- Experience with IaC (Infrastructure as Code) concepts, principles, and automation tools such as Ansible and Terraform
- Experience with Git Version Control System
Desired Skills + Experience: - Familiar with Site Reliability Engineering (SRE) principles and applications
- Experience with the Atlassian Tool Suite (JIRA, Confluence)
- Experience with CI/CD principles, methodologies, and tools such as GitLab CI
- Experience using system monitoring tools such as Grafana/Prometheus