Penguin Computing Managed Services provides remote, Linux cluster system administration. You will be required to understand, troubleshoot and document the entire cluster stack from hardware RMAs up through application troubleshooting. Typical tasks include: high speed network integration, scheduler and resource manager maintenance, and application compilation and optimization. You will interact with and support some of the top engineering, research, academic and IT professionals in the HPC community.
This is a customer facing position. Candidates must have excellent communication skills, a friendly demeanor and the ability to remain calm, focused and organized.
Candidates should have proven skills with a number HPC technologies. The following list is a representative skill set with examples:
Cluster management software: Scyld, xCAT, ROCKS
Resource Managers/Schedulers: TORQUE, Moab, SGE, SLURM
General Linux System Administration at the level of RHCSA
General Network configuration: TCP/IP administration and troubleshooting
Configuration Management: Ansible, Chef, Puppet
High performance network configuration and tuning: 10gigE, InfiniBand
Parallel and scale-out file systems (Lustre, Panasas, Ceph, Gluster)
GNU toolchain compilation/optimization (gfortran, cc & g++)
MPI libraries (MPICH, MVAPICH, and OpenMPI)
GPU technologies: CUDA, AMD Fusion
Industry standard parallel application tuning and submission (Abacus, Ansys, WRF, MatLab, BLAST, LS-DYNA, Gaussian, etc.
The candidate will manage Linux HPC and HA clusters. The work will be roughly 80% remote and 20% on site (Salt Lake City area). On site visits will be for break fix (hard drives, CPU, motherboard swaps etc.) and will be variable depending on customer needs. The candidate must fully and clearly document any work performed. The candidate will also be required to attend ongoing status meetings with customers and internal Penguin teams.
Bachelor’s Degree in Computer Science, or Electrical Engineering (Or equivalent experience)
7 years of hands-on experience with application of parallel programming technologies, application optimization in a Linux cluster environment, software installation in a variety of cluster environments, cluster set-up and configuration
Strong knowledge of High Performance Computing (HPC) application development