Advanced Micro Devices, Inc

Technical Program Manager- AI Cluster Validation

Advanced Micro Devices, Inc$120K — $150K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in hardware or AI infrastructure program management.
  • Strong understanding of GPU-based AI platforms and related hardware components.
  • Proficiency in leading cross-functional teams effectively without direct authority.
  • Excellent communication skills for clear status updates to all audience levels.
  • Experienced with project management tools such as Jira and Confluence.

Responsibilities

  • Define and manage program plans for AI infrastructure systems including cluster-scale deployment.
  • Maintain key program management documentation like schedules and resource forecasts.
  • Identify risks and create mitigation strategies across engineering teams.
  • Lead execution reviews with engineering teams, providing data-driven updates to executives.
  • Oversee GPU program execution, including system readiness and validation at multiple levels.
  • Manage planning and execution of multi-node scale testing for AI solutions.
  • Lead platform debug efforts to ensure timely resolution of engineering issues.

Benefits

  • Comprehensive medical, dental, and vision plans.
  • 401(k) with company matching contributions.
  • Employee stock purchase plan (ESPP).
  • Generous paid time off and holidays.
  • Ongoing professional development opportunities.
Full Job Description
Technical Program Manager- AI Cluster Validation

THE ROLE

We are seeking a Technical Program Manager to lead execution of AI cluster engineering programs with deep focus on GPU platforms, rack-level solutions, and AI Cluster validation. This role is responsible for driving end-to-end delivery from GPU + server integration through rack bring-up, scale testing, failure analysis, and system debug closure, ensuring platform readiness for hyperscale and enterprise AI deployments.

This role operates at the intersection of hardware, firmware, networking, and scale-test execution, and requires strong technical depth combined with disciplined program execution.

THE PERSON

You are a hands-on TPM who thrives in complex, fast-moving ecosystems, and can connect deep technical details to crisp program plans, executive reporting, and customer outcomes. You are comfortable driving execution in bring-up and EVT/DVT/PVT working closely with engineers to root-cause issues, unblock debug, and make data-driven tradeoffs to keep programs moving. You bring urgency, ownership, and clarity to ambiguous problem spaces and can communicate effectively from lab floor to executive review.

KEY RESPONSIBILITIES

Program Leadership & Execution
  • Define, plan, and drive program plans for AI infrastructure systems validation and readiness, including server integration, rack bring-up, and cluster-scale deployment readiness.
  • Create and maintain core PM artifacts: schedules, dependency maps, resource forecasts, risk/issue logs, and program dashboards/status reports.
  • Identify and drive mitigation plans for issues/risks, including cross-team escalations and corrective actions across multiple engineering areas.
  • Drive regular execution reviews with engineering teams and provide concise, data-driven updates to senior leadership.


GPU & Platform Execution
  • Own program execution for GPU-based AI platforms, spanning system bring-up, qualification, scale readiness, and deployment validation across server, rack, and cluster levels.
  • Drive alignment across GPU, CPU, firmware, BIOS/BMC, and system teams to ensure readiness for scale testing and customer workloads.
  • Track platform issues, and debug dependencies; ensure risks are clearly documented, owned, and mitigated.


AI Rack / Cluster Validation
  • Own program planning and execution for multi-node and multi-rack scale testing, including test strategy, scheduling, coverage tracking, and readiness gates.
  • Lead end-to-end delivery of rack-level AI solutions, including compute trays, switch trays, cabling, power, cooling, and management infrastructure.
  • Ensure rack bring-up plans are executable, resourced, and gated with clear entry/exit criteria across EVT, DVT, and scale phases.
  • Drive coordination across lab operations, infrastructure, and engineering teams to unblock rack access, power, networking, and test readiness.
  • Partner with scale, performance, and automation teams to ensure workloads, stress tests, and regressions plans are ready before hardware arrives.

Debug, Failure Analysis & Risk Management
  • Act as the execution lead for platform debug, coordinating across engineering teams to ensure fast triage, root-cause analysis, and resolution of system-level issues.
  • Track high-impact failures (GPU, HSIO, FW, rack, network) through debug forums ensuring clear ownership and closure plans.
  • Balance debug depth vs. program timelines, escalating tradeoffs when needed and ensuring leadership has a clear view of risk and impact.

REQUIRED QUALIFICATIONS
  • Experience leading complex hardware or AI infrastructure programs with ownership across bring-up, validation, and deployment phases.
  • Strong technical understanding of GPU-based AI systems, rack architectures, and datacenter infrastructure.
  • Proven ability to manage ambiguity, drive debug execution, and lead cross-functional teams without direct authority.
  • Strong written and verbal communication skills, including executive-level status reporting.
  • Proficiency with program management and execution tools (Jira, Confluence, dashboards, Excel/PowerPoint).

PREFERRED QUALIFICATIONS
  • Hands-on experience with GPU cluster scale testing, system stress, or performance validation.
  • Familiarity with rack-level bring-up, power/cooling constraints, networking, and failure modes at scale.
  • Experience working through hardware/firmware debug cycles in pre-production or customer-facing environments.

ACADEMIC CREDENTIALS
  • Bachelor's or master's degree in systems, EE, CS, or related engineering discipline.
  • PMP, Scrum Master, or equivalent program management training.

LOCATION

Austin, TX

This role is not eligible for visa sponsorship.

#LI-JE1

Benefits offered are described: AMD benefits at a glance.

About Advanced Micro Devices, Inc

Advanced Micro Devices, Inc. Careers

Join the innovative forefront of technology with a career at Advanced Micro Devices, Inc. (AMD), a leader in semiconductor development. As part of our global team, you will contribute to an organization renowned for its dedication to innovation, leadership, and diversity in the tech industry.

Work You’ll Do

At AMD, we offer job opportunities that push the boundaries of what is possible. Our team is composed of professionals who lead the way in microprocessor and graphics technology, driving industry standards and innovation. With AMD, you will be part of a culture that values growth and professional development, ensuring that every team member has the opportunity to excel.

Transform Your Career

AMD is not just about advancing technology, but also about advancing careers. Whether you are looking for an internship, a full-time position, or leadership roles, AMD provides the platform to propel your career to new heights. Our commitment to professional growth is matched by our dedication to diversity and inclusion, making AMD a place where everyone can thrive.

Innovative Work Environment

Join a team of over 12,000 dedicated professionals at the intersection of technology, industry expertise, and digital innovation. At AMD, you will work on groundbreaking projects that shape the future of computing and graphics. Our collaborative environment encourages networking and the sharing of ideas across teams and disciplines.

Career Development and Benefits

AMD is committed to the development of its employees. We offer robust training programs, including leadership development and diversity training, to ensure our team is equipped for both current challenges and future opportunities. Our benefits package is designed to support the well-being and financial security of our employees and their families.

Explore Job Opportunities

From engineering to marketing, AMD offers a range of career paths that cater to diverse skills and interests. Our hiring process is designed to be transparent and engaging, helping you to understand where you fit within our team and how you can contribute to our collective goals.

Stay Connected

Join Our Team Search open positions that match your skills and interest. We look for passionate, curious, creative, and solution-driven team players. Explore the opportunities to join a company that’s committed to your career growth and to innovation in the technology sector.

Keep Up to Date

Stay ahead with career tips, insider perspectives, and industry-leading insights you can put to use today—all from the people who work here.

Job Alert Emails

Personalize your subscription to receive job alerts, latest news, and insider tips tailored to your preferences. Discover the exciting and rewarding career opportunities that await at Advanced Micro Devices, Inc.

Interview and Resume Tips

Prepare for your future with AMD by accessing resources that help you craft your resume and excel in interviews. Our goal is to help you showcase your best professional self and align your skills with the needs of our dynamic team. At Advanced Micro Devices, Inc., we empower our employees to innovate, lead, and grow. Join us in driving the future of technology while building a rewarding and sustainable career.
Learn more about Advanced Micro Devices, Inc
Size
15,500 employees
Market Cap
$100.9 billion
Industry
Net Income
$2.4 billion
Founded
1969
5 Year Trend
+30.9%
Revenue
$9.7 billion
NASDAQ

Similar Jobs

More Jobs at Advanced Micro Devices, Inc

More Information Technology Jobs

Find similar Technical Program Manager- AI Cluster Validation jobs: