We are seeking an experienced
HPC Administrator with a strong background in
Linux systems administration,
HPC environments, and
workload scheduling platforms such as
PBS Professional or
Slurm. The ideal candidate is comfortable supporting engineering and research teams, managing GPU-enabled infrastructure, automating operational processes, and maintaining highly available, secure computing environments that support mission-critical workloads.
Responsibilities- Administer and maintain Altair Access, PBS Professional, and Slurm workload scheduling environments
- Manage user onboarding, account provisioning, queue access, and resource allocation policies
- Install, configure, and support engineering applications including ANSYS, STAR-CCM+, SIMPACK, Simufact, Tacoma, and related software platforms
- Monitor cluster health, performance, job execution, storage utilization, and overall system capacity
- Maintain and support GPU infrastructure, including NVIDIA drivers, CUDA toolkits, and fabric management services
- Perform operating system patching, firmware upgrades, security remediation, and vulnerability management activities
- Troubleshoot hardware, software, application, networking, storage, and job scheduling issues
- Support containerized workloads and AI/ML environments utilizing GPU resources
- Coordinate support activities with technology vendors including Dell, NVIDIA, Altair, Ansys, and other strategic partners
- Develop automation solutions and infrastructure-as-code practices to improve operational efficiency and consistency
- Ensure compliance with cybersecurity and regulatory requirements, including CMMC, NIST 800-171, and export control standards
- Provide technical guidance to engineering and research teams regarding application performance, resource utilization, and HPC best practices
- Participate in capacity planning, architecture reviews, hardware refreshes, and future HPC expansion initiatives
- Collaborate with cross-functional teams to maintain a reliable, secure, and high-performing computing environment
Required Skills & Qualifications- 7+ years of experience administering Linux systems in enterprise or HPC environments
- Experience supporting High Performance Computing (HPC) infrastructure and clustered computing environments
- Hands-on experience with PBS Professional, Slurm, or similar workload schedulers
- Strong knowledge of Red Hat Enterprise Linux (RHEL) or comparable Linux distributions
- Experience supporting engineering and simulation applications in a research or engineering environment
- Experience administering GPU-based computing platforms, including NVIDIA GPUs and CUDA
- Proficiency with scripting and automation using Python, Bash, or similar technologies
- Experience with container technologies such as Docker, Podman, or Kubernetes
- Strong understanding of storage, networking, system performance tuning, and troubleshooting
- Familiarity with infrastructure automation and configuration management practices
- Knowledge of cybersecurity principles, vulnerability remediation, and enterprise compliance requirements
- Strong analytical, troubleshooting, and problem-solving skills
- Ability to communicate effectively with technical and non-technical stakeholders
- Experience working collaboratively within engineering, research, or scientific computing environments
Other InformationThe work hours will be approximately 8:00 am to 5:00 pm EST, depending on workload, with the occasional late night when a tight deadline calls for it. We work for security-conscious clients, thus background checks will be required. Salary dependent upon experience.