Hardware Qualification Engineer, Senior Staff Location: Santa Clara, CA (Onsite/Hybrid)
About the RoleAs a
Hardware Qualification Engineer, Senior Staff, you will be the final gatekeeper of quality and reliability for our next-generation AI compute platforms. This is a high-impact, technical leadership role responsible for defining the qualification strategies that ensure our hardware-from ASIC substrates to Rack scale systems - can withstand the rigorous demands of 24/7 data center environments.
You will bridge the gap between design and mass production, ensuring that "cutting edge" doesn't mean "fragile."
Key Responsibilities- Qualification Strategy: Define and execute comprehensive HW qualification plans (EVT/DVT/PVT) for complex AI accelerator systems, including PCIe cards OAM modules and chassis systems.
- Stress & Reliability Testing: Lead the implementation of HALT/HASS, thermal cycling, humidity, and vibration testing to identify marginalities in high-power ASIC designs.
- Signal & Power Integrity Validation: Provide senior-level oversight for the validation of high-speed SerDes (112G/224G/PCIe), LPDDR, CoWOS designs, and complex PDN (Power Delivery Network) transients.
- Failure Analysis (FA): Drive root-cause analysis for complex system-level failures, utilizing tools like X-ray, CT scan, and SEM/EDX to distinguish between design flaws, manufacturing defects, and material fatigue.
- Interconnect Reliability: Specifically oversee the qualification of high-bandwidth interconnects (NVLink/UALink equivalents) and PCIe Gen5/6 link stability.
- Cross-Functional Influence: Work directly with Design, Thermal, and Firmware teams to implement hardware fixes based on qualification data.
- Vendor Management: Partner with CMs (Contract Manufacturers) and JDMs to align on test methodologies and production outgoing quality limits (OQL).
Required Qualifications- Education: BS/MS in Electrical Engineering, Mechanical Engineering, or Reliability Engineering.
- Experience: 12+ years in hardware qualification or reliability, specifically with high-performance computing (HPC), servers, or networking hardware.
- Technical Depth: Mastery of JEDEC, IPC, and Telcordia standards for hardware reliability.
- Diagnostic Skills: Expert-level experience with high-speed oscilloscopes, logic analyzers, and environmental chambers.
- Statistical Proficiency: Strong command of statistical methods (Weibull analysis, DOE, and JMP/Minitab) to predict product life cycles and failure rates.
Preferred Skills- Experience qualifying Air andLiquid Cooling components (CDUs, cold plates, leak detection) for high-TDP AI systems.
- Background in ASIC/Package-level reliability, including electromigration and Thermal Stress Analysis.
- Familiarity with Python or LabVIEW for automating complex test sequences.
The "d-Matrix" EdgeWe don't just test; we innovate. You will be working on a unique chiplet-based architecture where traditional qualification boundaries are blurred. You'll have the opportunity to build a world-class reliability lab from the ground up and set the standard for how Digital In-Memory Computing (DIMC) is validated.