Candidates should have proven skills with a number HPC technologies. The following list is a representative skill set with examples:
- Cluster management software: Scyld, xCAT, ROCKS
- Resource Managers/Schedulers: TORQUE, Moab, SGE, SLURM
- General Linux System Administration at the level of RHCSA
- General Network configuration: TCP/IP administration and troubleshooting
- Configuration Management: Ansible, Chef, Puppet
- High performance network configuration and tuning: 10gigE, InfiniBand
- Parallel and scale-out file systems (Lustre, Panasas, Ceph, Gluster)
- GNU toolchain compilation/optimization (gfortran, cc & g++)
- MPI libraries (MPICH, MVAPICH, and OpenMPI)
- GPU technologies: CUDA, AMD Fusion
- Industry standard parallel application tuning and submission (Abacus, Ansys, WRF, MatLab, BLAST, LS-DYNA, Gaussian, etc.
The candidate will manage Linux HPC and HA clusters. The work will be roughly 80% remote and 20% on site (Salt Lake City area). On site visits will be for break fix (hard drives, CPU, motherboad swaps etc.) and will be variable depending on customer needs. The candidate must fully and clearly document any work performed. The candidate will also be required to attend ongoing status meetings with customers and internal Penguin teams.
- Bachelor’s Degree in Computer Science, or ElectricalEngineering (Or equivalent experience)
- 5 years of hands-on experience with application of parallel programming technologies, application optimization in a Linux cluster environment, software installation in a variety of cluster environments, cluster set-up and configuration
- Strong knowledge of High Performance Computing (HPC) application development
- Security clearance
- Ability to help grow a vibrant, leading edge professional services organization a plus.