CAE HPC System Administrator

Toyota Tsusho Systems

$90K — $120K *
Technical Services
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 3+ years of experience in Linux system administration, preferably RHEL.
  • Bachelor's degree in engineering or computer science or equivalent experience.
  • Hands-on management of HPC clusters and job schedulers (e.g., LSF, Slurm, PBS).
  • Proven experience with CAE application support.
  • Strong skills in scripting (Bash, Shell, Perl, etc.).
  • Understanding of enterprise server hardware, storage, and networking essentials.
  • Familiarity with high-performance networking technologies like InfiniBand is a plus.

Responsibilities

  • Administer and optimize HPC job scheduling environments.
  • Design and tune job queues and resource allocation policies.
  • Install and support CAE applications in production environments.
  • Manage CAE software licenses and troubleshoot related issues.
  • Maintain RHEL environments across HPC clusters.
  • Automate OS provisioning, patch management, and system tasks with scripts.
  • Conduct capacity planning and support for hardware lifecycle activities.

Benefits

  • Hybrid full-time contract work arrangement.
  • Flexibility to accommodate maintenance windows and critical production issues.
  • Potential for after-hours or weekend work when required.
Full Job Description
Position Summary:

We are seeking a highly motivated and experienced CAE HPC System Administrator with more than 3 years of experience to join our dynamic digital solution manufacturing team. This position is ideal for a candidate with strong Linux system administration experience and hands-on expertise managing HPC environments for CAE workloads, including job schedulers, system automation, and engineering application support. The successful candidate will be responsible for administering and optimizing HPC clusters, managing job scheduling systems, supporting CAE applications and licensing, automating Linux operations, maintaining infrastructure performance, and ensuring system stability, scalability, and efficient workload execution.

Requirements

Essential Functions:
  1. HPC Job Queuing & Workload Management
    • Administer, configure, and optimize HPC job scheduling environments, including IBM Spectrum LSF, Open PBS, or equivalent schedulers.
    • Design and tune job queues, resource allocation policies, and scheduling strategies to support diverse CAE workloads.
    • Monitor system performance and utilization trends and implement improvements to maximize efficiency and throughput.
  2. CAE Application and Licensing Support
    • Install, upgrade, test, and support CAE applications and simulation tools in production environments.
    • Provide integration support between CAE applications and HPC scheduling systems.
    • Manage CAE software licensing systems (e.g., FlexLM, RLM) and ensure availability.
    • Troubleshoot application-related issues and ensure minimal disruption to engineering activities.
  3. Linux Systems Administration & Automation
    • Administer and maintain Red Hat Enterprise Linux (RHEL) environments across HPC clusters.
    • Perform OS provisioning, deployment, and patch management using automated tools (e.g., PXE, or configuration management solutions).
    • Develop and maintain scripts (Bash, Korn shell, C Shell, Perl, Awk, or equivalent) to automate system monitoring, health checks, and routine administrative tasks.
    • Maintain system logs, monitoring processes, and standard operating procedures.
  4. Hardware & Infrastructure Management
    • Troubleshoot and resolve issues related to servers, storage systems, and high-performance networking (e.g., InfiniBand, high-speed Ethernet).
    • Support hardware lifecycle activities including installation, maintenance, and upgrades.
    • Conduct capacity planning based on system utilization trends and future demand.
  5. Operations, Monitoring & Continuous Improvement
    • Perform system health checks, monitoring, and incident tracking for HPC and CAE environments.
    • Document system configurations, procedures, incidents, and best practices.
    • Track outages, analyze root causes, and implement preventive measures.
    • Follow change management processes for system updates and deployments.
    • Provide accurate reporting (e.g., utilization, incidents, system performance) and support project initiatives.


Minimum qualifications:

Required Education & Experience:
• 3+ years of Linux system administration experience (preferably RHEL environments).
• Bachelor's degree in mechanical engineering, electrical engineering, computer engineering, computer science, or related field; and/or commensurate work experience
• Hands-on experience managing HPC clusters and job schedulers (LSF, Slurm, PBS, or similar).
• Proven experience in CAE application support and integration.
• Strong scripting skills (Bash, Shell, Perl, or equivalent).
• Experience with OS deployment, patching, and system automation.
• Solid understanding of enterprise server hardware, storage, and networking fundamentals.
• Experience with CAE tools such as Ansys, LS-DYNA, Nastran, or similar.
• Familiarity with high-performance networking technologies is plus (e.g., InfiniBand).
• Experience developing internal tools or dashboards are plus (e.g., PHP or web-based tooling).

Position Type/Expected Hours of Work:
• Hybrid Full-time contract: Standard business hours with flexibility required to support maintenance windows and critical production issues.
• Occasional after-hours or weekend work may be required based on business needs.

Similar Jobs

More Jobs at Toyota Tsusho Systems

More Technical Services Jobs

Find similar CAE HPC System Administrator jobs: