KLA Tencor

HPC / AI Software Infrastructure Lead (E)

KLA Tencor$151K — $256K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 10+ years in software engineering with team leadership experience
  • Expertise in building distributed systems in HPC, AI/ML, or cloud environments
  • Demonstrated success in delivering scalable, performance-critical infrastructure
  • Proven ability to mentor and develop engineers at various career stages
  • Strong programming skills in C++, Python, or similar languages

Responsibilities

  • Lead the architecture of large-scale HPC and AI infrastructure
  • Design high-performance distributed systems for image processing and AI
  • Drive optimization of GPU-accelerated computing strategies
  • Collaborate across teams to deliver production-ready platforms
  • Establish engineering best practices for mission-critical systems
  • Mentor engineers and provide technical guidance
  • Influence architectural design and long-term strategy

Benefits

  • Medical, dental, and vision insurance
  • 401(K) with company matching
  • Employee stock purchase program (ESPP)
  • Tuition reimbursement and student debt assistance
  • Paid time off and company holidays
  • Wellness benefits including employee assistance program (EAP)
  • Opportunities for career development and growth programs
Full Job Description
Job Description/Preferred Qualifications

HPC/AI Software Infrastructure Leads are core to KLA's technology, while we do not currently have an opening, we are always building our HPC/AI Software Infrastructure Lead Engineering talent community, we are interested in learning about your background.

Apply to this posting for Future Opportunities with KLA.

At KLA, we're pushing the boundaries of semiconductor inspection through advanced AI and high-performance computing. We are looking for a hands-on technical leader to architect and scale the next generation of AI/HPC infrastructure powering our most critical imaging and data platforms. This role is ideal for someone who thrives at the intersection of distributed systems, GPU computing, and real-world AI workloads, and who enjoys building and mentoring high-performing engineering teams while driving technical excellence.

What You'll Do
  • Lead the architecture and development of large-scale HPC and AI infrastructure supporting cutting-edge image processing and machine learning workloads
  • Design scalable, high-performance distributed systems that unify traditional image processing with modern AI/Deep Learning pipelines
  • Drive GPU-accelerated computing strategies, optimizing performance across compute, storage, and networking layers
  • Partner cross-functionally with hardware, algorithms, and product teams to deliver robust, production-ready platforms
  • Establish engineering best practices (code quality, CI/CD, observability, performance tuning) for mission-critical systems
  • Mentor and develop engineers, providing technical guidance, coaching, and growth opportunities for junior team members
  • Serve as a technical leader and decision-maker, influencing architecture and long-term platform strategy


What You Bring

Experience
  • 10+ years in software engineering, including leading and scaling technical teams
  • Proven success building distributed systems in HPC, AI/ML, or cloud-native environments
  • Track record of delivering performance-critical infrastructure at scale
  • Experience mentoring and growing early- and mid-career engineers


Technical Expertise
  • Deep understanding of distributed systems, parallel computing, and Linux systems programming
  • Strong programming skills in C++, Python, or similar systems-level languages
  • Experience with GPU computing (CUDA, ROCm) and modern AI frameworks (PyTorch, TensorFlow, etc.)
  • Familiarity with high-performance storage systems, networking, and data pipelines
  • Strong foundation in CI/CD, DevOps, and production system reliability


Bonus Experience
  • Background in image processing, computer vision, or scientific computing
  • Experience supporting hybrid HPC + AI workloads in production environments


Leadership & Impact
  • Passion for developing talent and building inclusive, high-performing teams
  • Ability to operate as both a hands-on engineer and strategic technical leader
  • Strong communication skills with the ability to influence across engineering and product stakeholders


Why KLA / Why Ann Arbor
  • Work on real-world AI systems at scale, not just experiments
  • Collaborate across hardware, software, and algorithm teams in a deeply technical environment
  • Join a growing engineering presence in Ann Arbor, with access to top talent and a strong technical community
  • Opportunity to shape the direction of AI infrastructure in a core product domain


Minimum Qualifications

Doctorate (Academic) Degree and related work experience of 5 years; Master's Level Degree and related work experience of 8 years; Bachelor's Level Degree and related work experience of 12 years

Base Pay Range: $151,100.00 - $256,900.00

Primary Location: USA-MI-Ann Arbor-KLA

KLA's total rewards package for employees may also include participation in performance incentive programs and eligibility for additional benefits including but not limited to: medical, dental, vision, life, and other voluntary benefits, 401(K) including company matching, employee stock purchase program (ESPP), student debt assistance, tuition reimbursement program, development and career growth opportunities and programs, financial planning benefits, wellness benefits including an employee assistance program (EAP), paid time off and paid company holidays, and family care and bonding leave.

Interns are eligible for some of the benefits listed. Our pay ranges are determined by role, level, and location. The range displayed reflects the pay for this position in the primary location identified in this posting. Actual pay depends on several factors, including state minimum pay wage rates, location, job-related skills, experience, and relevant education level or training. We are committed to complying with all applicable federal and state minimum wage requirements where applicable. If applicable, your recruiter can share more about the specific pay range for your preferred location during the hiring process.

About KLA Tencor

KLA Corporation is a global capital equipment company that provides process control solutions for semiconductor and related industries. The Company's products are also used in a number of other high technology industries, including the packaging, light emitting diode (LED), power device and compound semiconductor markets. Its products and services are used by bare wafer, integrated circuit (IC), lithography reticle (reticle or mask) and disk manufacturers around the world. The Company's inspection and metrology products and related offerings are categorized in various groups, including Chip Manufacturing, Wafer Manufacturing, Reticle Manufacturing, LED, Power Device and Compound Semiconductor Manufacturing, Data Storage Media/Head Manufacturing, Microelectromechanical Systems (MEMS) Manufacturing, and General Purpose/Lab Applications.
Learn more about KLA Tencor
Size
11,300 employees
Market Cap
$52 billion
Industry
Net Income
$1.3 billion
Founded
1997
5 Year Trend
+21.5%
Revenue
$6 billion
NASDAQ

Similar Jobs

More Jobs at KLA Tencor

More Information Technology Jobs

Find similar HPC / AI Software Infrastructure Lead (E) jobs: