KLA Tencor

High Performance Compute (HPC) Software Engineer - HPC SW Systems

KLA Tencor$105K — $180K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's or Master's in Computer Science, Computer Engineering, Electrical Engineering, or similar
  • Strong development experience in HPC or systems software on Linux
  • Proficiency in Java, C++, or other performance-related languages
  • Hands-on with parallel computing techniques such as MPI or OpenMP
  • Solid understanding of HPC hardware: CPUs, memory, storage, networking
  • Experience with clusters and rack-scale systems in lab or production
  • Skilled in debugging software, OS, and hardware issues

Responsibilities

  • Design and optimize HPC software for large-scale Linux clusters
  • Enhance application performance and power efficiency across multiple components
  • Develop tools for cluster diagnostics, monitoring, and health checks
  • Collaborate with teams to translate workloads into efficient software solutions
  • Define HPC node and interconnect requirements in collaboration with hardware teams
  • Participate in debugging hardware/software performance and stability issues
  • Contribute to infrastructure best practices and design reviews

Benefits

  • Medical, dental, vision, and life insurance
  • 401(K) with company matching
  • Employee stock purchase program (ESPP)
  • Tuition reimbursement and student debt assistance
  • Opportunities for professional development and career growth
  • Wellness benefits and an employee assistance program
  • Paid time off and holidays, family care and bonding leave
Full Job Description
Job Description/Preferred Qualifications

Key Responsibilities

HPC Software Engineering
• Design, develop, and optimize HPC software running on large-scale Linux clusters, including distributed and parallel workloads (MPI, multithreading, GPU-accelerated pipelines, containerized workloads).
• Optimize application performance and power utilization across CPU, memory, storage, and network subsystem, with attention to throughput, latency, and scaling behavior.
• Develop and maintain system-level tooling for cluster bring-up, diagnostics, monitoring including component power usages, and health checks.
• Work closely with algorithms, systems and application teams to understand and translate workload characteristics into power-efficient HPC software solutions.

HPC Systems & Hardware Awareness
• Collaborate with hardware and systems teams to define HPC node, storage, and interconnect requirements based on software and algorithm needs.
• Understand and influence CPU/GPU selection, memory sizing, PCIe layout, NUMA behavior, and network topology to ensure optimal software performance.
• Participate in HW/SW co-debug activities, including performance bottlenecks, stability issues, and failure analysis.

Rack & Infrastructure Engineering
• Understand rack-level integration of HPC systems, focusing on power, cooling, cabling, networking, and physical layout considerations.
• Understand data-center and lab constraints such as power budgets, thermal limits, network drops, and serviceability.
• Contribute to best practices, and design reviews for new platforms and refresh cycles.

Cross-Functional Collaboration
• Act as a technical bridge between software, hardware, systems teams.
• Provide clear technical documentation covering software and system architecture, deployment flows, performance assumptions.

Required Qualifications
• Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience.
• Strong experience developing HPC or systems software on Linux.
• Proficiency in Java and/or C++ and/or other system-level or performance-oriented languages.
• Hands-on experience with parallel computing (MPI, OpenMP, multithreading). Candidates with GPU computing (CUDA, ROCm, or equivalent) would be preferred.
• Solid understanding of HPC hardware fundamentals: CPUs, memory hierarchies, storage, networking (Ethernet / InfiniBand).
• Practical experience working with clusters, servers, or rack-scale systems in lab or production environments.
• Strong debugging skills across software, OS, and hardware boundaries.

Preferred Qualifications
• Experience with containerized HPC environments (Docker, Singularity/Apptainer, Kubernetes in HPC contexts).
• Familiarity with high-speed interconnects, storage architectures, and performance benchmarking.
• Exposure to rack integration, including cabling, power distribution, cooling, and system bring-up.
• Experience in semiconductor, manufacturing, or high-reliability systems environments.
• Ability to reason about system reliability, MTBF/MTBA, and failure modes in large compute installations.

What Makes This Role Unique at KLA
• Work on mission-critical HPC platforms that directly impact semiconductor manufacturing capability.
• Influence both software architecture and physical system design, not just code in isolation.
• Collaborate with world-class experts across algorithms, hardware, systems, and operations.
• See your work deployed at scale in real production tools-not just in the data center.

Minimum Qualifications

Doctorate (Academic) Degree and 0 years related work experience; Master's Level Degree and related work experience of 3 years; Bachelor's Level Degree and related work experience of 5 years

Base Pay Range: $105,900.00 - $180,000.00 Annually

Primary Location: USA-MI-Ann Arbor-KLA

KLA's total rewards package for employees may also include participation in performance incentive programs and eligibility for additional benefits including but not limited to: medical, dental, vision, life, and other voluntary benefits, 401(K) including company matching, employee stock purchase program (ESPP), student debt assistance, tuition reimbursement program, development and career growth opportunities and programs, financial planning benefits, wellness benefits including an employee assistance program (EAP), paid time off and paid company holidays, and family care and bonding leave.

Interns are eligible for some of the benefits listed. Our pay ranges are determined by role, level, and location. The range displayed reflects the pay for this position in the primary location identified in this posting. Actual pay depends on several factors, including state minimum pay wage rates, location, job-related skills, experience, and relevant education level or training. We are committed to complying with all applicable federal and state minimum wage requirements where applicable. If applicable, your recruiter can share more about the specific pay range for your preferred location during the hiring process.

About KLA Tencor

KLA Corporation is a global capital equipment company that provides process control solutions for semiconductor and related industries. The Company's products are also used in a number of other high technology industries, including the packaging, light emitting diode (LED), power device and compound semiconductor markets. Its products and services are used by bare wafer, integrated circuit (IC), lithography reticle (reticle or mask) and disk manufacturers around the world. The Company's inspection and metrology products and related offerings are categorized in various groups, including Chip Manufacturing, Wafer Manufacturing, Reticle Manufacturing, LED, Power Device and Compound Semiconductor Manufacturing, Data Storage Media/Head Manufacturing, Microelectromechanical Systems (MEMS) Manufacturing, and General Purpose/Lab Applications.
Learn more about KLA Tencor
Size
11,300 employees
Market Cap
$52 billion
Industry
Net Income
$1.3 billion
Founded
1997
5 Year Trend
+21.5%
Revenue
$6 billion
NASDAQ

Similar Jobs

More Jobs at KLA Tencor

More Information Technology Jobs

Find similar High Performance Compute (HPC) Software Engineer - HPC SW Systems jobs: