Hardware Systems Engineer, NPI AI

Meta

$130K — $180K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent experience
  • 6+ years of experience in hardware systems engineering, silicon or firmware validation, or system bring-up for AI servers and GPUs
  • Proficient in ASIC bring-up, board-level debug, and large-scale system validation within data center environments
  • Experience in developing test specifications and validation procedures for complex hardware systems
  • Demonstrated ability in leading root-cause analysis across hardware, firmware, and software stacks
  • Familiarity with high-speed interconnects like PCIe and NVLink in AI or HPC contexts

Responsibilities

  • Lead the development of validation strategies for AI and HPC hardware platforms
  • Drive hands-on bring-up, characterization, and validation of AI server systems
  • Develop and maintain test specifications and validation procedures for NPI programs
  • Investigate and resolve complex system failures across engineering disciplines
  • Manage hardware and firmware defect tracking and resolution during NPI milestones
  • Identify and close gaps in test methodologies and automation frameworks
  • Collaborate with engineering teams to define acceptance criteria for new AI hardware systems

Benefits

  • Collaborative work environment across cutting-edge technology teams
  • Opportunity to impact the introduction of next-generation AI infrastructure
  • Exposure to high-performance computing and large-scale data center operations
  • Engagement with cross-functional teams to enhance problem-solving skills
  • Opportunities for continuous learning and career growth in AI hardware engineering
Full Job Description
Meta is seeking a Hardware Systems Engineer to support the new product introduction (NPI) of next-generation AI and high-performance computing infrastructure for large-scale data center deployments. In this role, you will work at the intersection of AI silicon, server systems, and data center operations, partnering with hardware design, firmware, software, networking, and capacity engineering teams to validate and scale cutting-edge AI hardware systems from early bring-up through production readiness.

Responsibilities

Lead end-to-end system validation strategies for AI and HPC hardware platforms, including AI accelerators, GPU clusters, and high-bandwidth memory subsystems in data center environments
• Drive hands-on bring-up, characterization, and validation of AI server systems and associated components such as PCIe, NVLink, DRAM, and high-speed networking fabrics
• Develop and maintain test specifications, validation procedures, and debug guides tailored to AI infrastructure NPI programs
• Investigate and root-cause complex system failures spanning silicon, firmware, software, and hardware layers in collaboration with cross-functional engineering teams
• Triage and track hardware and firmware defects through resolution while maintaining forward progress on NPI program milestones
• Identify gaps in test coverage and drive improvements to test methodologies, tooling, and automation frameworks across the NPI lifecycle
• Partner with AI platform and capacity engineering teams to define acceptance criteria and deployment readiness standards for new AI hardware systems
• Guide data collection, analysis, and reporting efforts to surface systemic hardware quality trends and inform go/no-go decisions for production deployment
• Communicate validation status, risk assessments, and technical findings to internal engineering teams and external hardware vendors
• Collaborate with firmware and software teams to define hardware-software interface requirements for telemetry, diagnostics, and remote management of AI infrastructure

Minimum Qualifications
• Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
• 6+ years of experience in hardware systems engineering, silicon validation, firmware validation, or system-level bring-up for AI servers, GPUs, TPUs, or AI accelerator platforms
• Experience in one or more of the following domains: ASIC bring-up and characterization, board-level debug, firmware validation, or large-scale system validation in data center environments
• Experience developing test specifications, validation procedures, and debug methodologies for complex hardware systems
• Experience leading root-cause analysis and troubleshooting of system-level failures across hardware, firmware, and software stacks
• Experience with high-speed interconnects or memory subsystems such as PCIe, NVLink, DDR5, or HBM in the context of AI or HPC system validation

Preferred Qualifications
• 3+ years of experience with debugging tools for SoCs including JTAG, GDB, or Trace32, and familiarity with common bus protocols such as I2C, SPI, USB, and PCIe
• 3+ years of experience defining hardware-software interface requirements for telemetry, diagnostics, and out-of-band management in AI infrastructure deployments
• Experience integrating lab instrumentation and automation frameworks to support large-scale NPI validation workflows
• Proficiency in Linux environments and server system management tools used in data center operations

Similar Jobs

More Jobs at Meta

More Information Technology Jobs

Find similar Hardware Systems Engineer, NPI AI jobs: