Managed Services Engineer

Penguin Computing   •  

Fremont, CA

Industry: Professional, Scientific & Technical Services

  •  

5 - 7 years

Posted 36 days ago

Penguin Computing Managed Services provides remote, Linux cluster system administration. You will be required to understand, troubleshoot and document the entire cluster stack from hardware RMAs up through application troubleshooting. Typical tasks include: high speed network integration, scheduler and resource manager maintenance, and application compilation and optimization. You will interact with and support some of the top engineering, research, academic and IT professionals in the HPC community.

This is a customer facing position. Candidates must have excellent communication skills, a friendly demeanor and the ability to remain calm, focused and organized.

Candidates should have proven skills with a number HPC technologies. The following list is a representative skill set with examples:

Cluster management software: Scyld, xCAT, ROCKS

Resource Managers/Schedulers: TORQUE, Moab, SGE, SLURM

General Linux System Administration at the level of RHCSA

General Network configuration: TCP/IP administration and troubleshooting

Configuration Management: Ansible, Chef, Puppet

High performance network configuration and tuning: 10gigE, InfiniBand

Parallel and scale-out file systems (Lustre, Panasas, Ceph, Gluster)

GNU toolchain compilation/optimization (gfortran, cc & g++)

MPI libraries (MPICH, MVAPICH, and OpenMPI)

GPU technologies: CUDA, AMD Fusion

Industry standard parallel application tuning and submission (Abacus, Ansys, WRF, MatLab, BLAST, LS-DYNA, Gaussian, etc.

Duties:

The candidate will manage Linux HPC and HA clusters. The work will be roughly 80% remote and 20% on site (Salt Lake City area). On site visits will be for break fix (hard drives, CPU, motherboard swaps etc.) and will be variable depending on customer needs. The candidate must fully and clearly document any work performed. The candidate will also be required to attend ongoing status meetings with customers and internal Penguin teams.

Qualifications:

Basic Qualifications:

Bachelor’s Degree in Computer Science, or Electrical Engineering (Or equivalent experience)

7 years of hands-on experience with application of parallel programming technologies, application optimization in a Linux cluster environment, software installation in a variety of cluster environments, cluster set-up and configuration

Preferred Qualifications:

Strong knowledge of High Performance Computing (HPC) application development