HPC Support Engineer

SAIC • $100K — $130K *

Charlottesville, VA 22903In-Person

Aerospace & Defense

8 - 10 years of experience

Reposted 2 weeks ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor degree in a science or technology field; equivalent experience accepted
8+ years of professional experience required
Minimum 5 years in Linux support for distributed compute workloads or HPC clusters
Active Top Secret clearance required, with potential for TS/SCI
Experience with HPC workload schedulers like Slurm or PBS
Proficient in command-line Linux environments
Familiar with scripting in languages like Bash or Python

Responsibilities

Support users executing workloads in Linux-based HPC cluster environments
Troubleshoot job execution issues in distributed workloads
Assist users with scheduler job submission scripts
Identify and resolve workload performance bottlenecks
Support GPU-enabled workloads and CUDA processing
Promote efficient cluster utilization and HPC best practices

Benefits

Ongoing opportunities for professional development
Work in a secure, cutting-edge research environment
Engage with experienced teams focused on advanced technology solutions
Participate in a mission-driven organization that serves national imperatives

Full Job Description

Job Description

Description

SAIC is looking for a highly qualified HPC Support Engineer to support the Army's Golden Dome initiative. The engineer will support users executing workloads within Linux-based High Performance Computing (HPC) cluster environments used for distributed compute workloads, simulation environments, and GPU-enabled processing.

The environment will include:

multi-node Linux compute clusters
workload scheduling platforms such as Slurm or PBS
distributed parallel compute workloads utilizing MPI or OpenMP
GPU-enabled compute resources supporting CUDA-based processing
high-performance networking technologies including RDMA / InfiniBand

The system will be used to support scientific computing, simulation workloads, and other distributed compute operations within a secure research environment.

Candidates should be comfortable working within cluster-scale computing environments where performance, scheduler configuration, and distributed workload execution are critical operational factors.

The HPC Support Engineer will assist users executing computational workloads within HPC cluster environments.

The role focuses on:

supporting distributed compute workloads
troubleshooting job execution issues
assisting users with scheduler job submission scripts
identifying workload performance bottlenecks
supporting GPU-enabled workloads
promoting efficient cluster utilization and HPC best practices

Candidates should have experience working with distributed compute workloads and Linux-based HPC environments.

Core Technical Capabilities

Candidates should demonstrate capability in most of the following areas.

HPC Workload Execution

Experience supporting execution of distributed workloads on HPC cluster platforms.

Candidates should understand how compute workloads interact with cluster schedulers, compute nodes, and distributed resources.

Workload Scheduling Platforms

Experience executing and troubleshooting workloads using schedulers such as:

Slurm
PBS / PBS Pro
Torque
Grid Engine

Candidates should understand job submission workflows and resource allocation concepts such as CPU, memory, and GPU scheduling.

Candidates should be comfortable reading and troubleshooting scheduler job submission scripts used to execute distributed workloads.

Linux Systems Usage

Strong Linux experience including:

command-line system usage
execution of compute workloads within Linux environments
troubleshooting application execution issues

Experience with RHEL-based environments is preferred.

Distributed Compute Workloads

Experience supporting distributed workloads utilizing parallel computing frameworks such as:

MPI
OpenMP

Experience supporting the compilation and execution of scientific or engineering applications within Linux HPC environments.

Familiarity with common HPC programming languages and compiler toolchains including:

C/C++
Fortran

Candidates should understand how compiled applications interact with scheduler configuration, compute resources, cluster networking, and distributed runtime environments.

Experience troubleshooting application build or runtime issues related to compiler configuration, library dependencies, or MPI environments is desirable.

Familiarity with common HPC compiler toolchains such as GCC, Intel, or LLVM-based compilers is desirable.

GPU Compute Workloads

Experience executing or supporting workloads utilizing GPU-enabled compute environments and CUDA frameworks is desirable.

Performance Troubleshooting

Ability to identify issues affecting workload execution including:

inefficient resource allocation
scheduler configuration issues
application execution failures
distributed compute performance bottlenecks

Automation and Operational Tooling

Experience writing scripts or tooling using languages such as:

Bash
Python

Automation experience supporting workload execution or operational tasks is beneficial.

Qualifications

Candidates must meet the following requirements:

Bachelor degree in science/technology; 4 additional YoE can be substituted for degree
8+ years of experience is required
Minimum 5 years of experience working in Linux environments supporting distributed compute workloads or HPC cluster platforms
An Active Top Secret clearance is required; an active TS/SCI clearance must be obtained prior to beginning work.
100% onsite support in Charlottesville, VA
Experience executing or troubleshooting workloads using HPC workload schedulers such as Slurm, PBS, Torque, or similar systems
Experience using command-line Linux environments
Experience with scripting or automation tools (Bash, Python, or similar)
Ability to obtain required DoD 8140 (8570) IAT Level II certification
Candidates must have direct experience working with HPC or distributed compute workloads.

Candidates with the following experience are strongly preferred:

Experience supporting HPC cluster environments used for distributed compute workloads
Experience executing or troubleshooting MPI or OpenMP workloads
Experience supporting GPU-enabled workloads and CUDA frameworks
Experience supporting scientific or engineering compute applications
Experience supporting research, laboratory, or mission computing environments
Experience supporting systems within DoD/DoW or IC environments

Overview

SAIC accepts applications on an ongoing basis and there is no deadline.

About SAIC

Science Applications International Corporation (SAIC) is a technology integrator in the technical, engineering, intelligence, and enterprise information technology markets. SAIC has approximately 26,000 employees and operates in more than 70 countries. The company was founded in 1969 and is headquartered in Reston, Virginia. SAIC provides services to the U.S. government, including the Department of Defense, the intelligence community, and civilian agencies. The company also serves commercial customers in the healthcare, energy, and financial services sectors.

Learn more about SAIC

Size

26,000 employees

Market Cap

$6 billion

Industry

Information Technology

Net Income

$206 million

Founded

1969

5 Year Trend

+10.7%

Revenue

$6.8 billion

NASDAQ

SAIC

* Ladders Estimates

Similar Jobs

Audiovisual Design & Integration Specialist (Mid-Level or Senior)
$119K — $197K *
Boeing
Herndon, VA 20171 (Fairfax County)
Today
Application Support Engineer
$90K — $120K *
Marathon TS
Remote
Today
Application Support Engineer
$90K — $120K *
Marathon TS
Arlington, VA 22204 (Arlington County)
Today
Systems Engineer - Watch Floor Operations, Rotating Shift
$90K — $120K *
LufCo
Annapolis Junction, MD 20701 (Howard County)
Reposted Today
System Engineer 1
$90K — $120K *
Avid Technology
Annapolis Junction, MD 20701 (Howard County)
Reposted Today
Systems Engineer II - ATS (Onsite)
$68K — $131K *
Raytheon Technologies
Atlantic City, NJ 08401 (Atlantic County)
Today

Get Ready For Your
Next Interview

More Jobs at SAIC

Stock Plan Administrator
$70K — $95K *
Somerset, KY 42503 (Pulaski County)
Today
Finance & Insurance
Hybrid
Capture Manager - Army & Navy Business Group
$100K — $130K *
Crane, IN 47522 (Martin County)
Today
Aerospace & Defense
In-Person
Capture Manager - Army & Navy Business Group
$120K — $150K *
Reston, VA 20191 (Fairfax County)
Today
Aerospace & Defense
In-Person
Capture Manager - Army & Navy Business Group
$120K — $150K *
Charleston, SC 29412 (Charleston County)
Today
Aerospace & Defense
In-Person
Capture Manager - Army & Navy Business Group
$100K — $130K *
Huntsville, AL 35810 (Madison County)
Today
Aerospace & Defense
In-Person

More Aerospace & Defense Jobs

Model Based Systems Engineer
$130K — $150K + paid health insurance & dependents, paid education assistance, *
Kitty Hawk Technologies
King George, VA 22485 (King George County)
Yesterday
Configuration Manager, Engineering Operations
$141K — $251K *
Thales Group
Irvine, CA 92620 (Orange County)
Today
Principal Electrical Engineer - SlingWorks
$100K — $130K *
Kollsman
Merrimack, NH 03054 (Hillsborough County)
Reposted Today
Associate, System Integration and Test Engineer
$76K — $141K *
Level 3 Communications, Inc
San Diego, CA 92154 (San Diego County)
Today
Lead, Trade Compliance
$100K — $130K *
Level 3 Communications, Inc
Salt Lake City, UT 84118 (Salt Lake County)
Today

Find similar HPC Support Engineer jobs:

Nationwide Charlottesville, VA

HPC Support Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar HPC Support Engineer jobs:

Get Ready For Your
Next Interview