Advanced Micro Devices, Inc

Rack Scale Serviceability & Telemetry Architect

Advanced Micro Devices, Inc$130K — $180K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years in platform architecture or system management roles, particularly in datacenter or HPC environments.
  • Expertise in BMC/embedded firmware and server manageability practices.
  • Hands-on experience with DMTF Redfish and OpenBMC technologies.
  • Proficiency in programming/scripting languages such as C/C++ and Python.
  • Strong knowledge of server RAS features and secure manageability architectures.
  • Bachelor's or Master's degree in Computer Science, Engineering, or related field; advanced degree preferred.

Responsibilities

  • Define and manage the architecture for rack-scale serviceability and telemetry across AMD's product lines.
  • Establish standards strategies and interface architectures using industry specifications like DMTF Redfish.
  • Direct the implementation of OpenBMC-based solutions for server management.
  • Architect telemetry frameworks for comprehensive health and performance monitoring.
  • Develop serviceability workflows for diagnostics, recovery, and firmware management.
  • Collaborate with cross-functional teams to ensure alignment on technical requirements and architecture development.
  • Mentor and guide engineers in best practices and architectural standards.

Benefits

  • Comprehensive health and wellness programs.
  • A 401(k) retirement plan with company match.
  • Employee stock purchase plan.
  • Flexible work arrangements, including hybrid working options.
  • Generous paid time off and holiday policies.
Full Job Description
Rack Scale Serviceability & Telemetry Architect

THE TEAM

AMD's Data Center GPU Systems Architecture team defines next-generation AMD Instinct platforms and complete rack-scale solutions for hyperscale AI and HPC deployments. We work across silicon, GPU system firmware, server and board architecture, BMC/platform firmware, management software, security, validation, manufacturing, and ecosystem partners to turn product strategy into deployable, serviceable, production-ready platforms.

THE ROLE

AMD is seeking a Principal Member of Technical Staff (PMTS) to own the architecture for rack-scale serviceability and telemetry across AMD Instinct product lines and complete rack-scale solutions. This is a highly visible technical leadership role responsible for defining the end-to-end manageability, observability, and serviceability architecture spanning node, chassis/tray, rack, and fleet domains. You will drive the strategy, architecture, execution, and delivery of standards-based solutions for inventory, discovery, health monitoring, telemetry, eventing, diagnostics, firmware lifecycle management, and field service workflows across the full AMD rack-scale stack.

In this role, you will independently own a critical cross-product architecture area and drive alignment across GPU/SoC architecture, server/platform architecture, BIOS/UEFI, BMC and embedded software, security, RAS, validation, ODM/OEM partners, and customer-facing teams. The role spans early concept definition through bring-up, validation, deployment, and post-launch improvement.

THE PERSON

The ideal candidate is a deeply technical system architect with strong first-principles thinking and a track record of delivering manageability, telemetry, and serviceability solutions for servers, accelerators, storage, networking, or rack-scale AI/HPC platforms. You are equally comfortable setting long-range technical direction and diving hands-on into protocol definitions, interface design, telemetry models, bring-up, debug, and root-cause analysis. You thrive in ambiguity, influence without authority, raise execution quality across teams, and exemplify AMD's values through direct, humble, collaborative, and inclusive leadership.

KEY RESPONSIBILITIES
  • Define and own the end-to-end rack-scale serviceability and telemetry architecture for AMD Instinct-based solutions, spanning node BMC, chassis/rack management, service processors/controllers, management network, and fleet-level observability integration.
  • Define the standards strategy and interface architecture using DMTF Redfish, PLDM, MCTP, and related specifications, maximizing standards compliance while establishing AMD/OEM extensions only where required.
  • Drive OpenBMC-based architecture and implementation direction for BMC and rack management controllers, including D-Bus object models, bmcweb/Redfish requirements, sensor and FRU inventory models, logging, eventing, firmware update, and debug workflows.
  • Architect telemetry frameworks for health, power, thermal, inventory, error, utilization, and service data. Define schemas, metric taxonomies, triggers, event models, aggregation, retention, and reporting strategies required for at-scale observability and automated service operations.
  • Define platform serviceability flows covering discovery, inventory correlation, fault isolation, diagnostics, crashdump and error capture, remote recovery, FRU replacement, firmware/driver update orchestration, and return-to-service procedures.
  • Partner with GPU/SoC architects, board and system architects, firmware and software teams, security/RAS, validation, manufacturing, and customer engineering to translate requirements into production-ready architecture and deliverables.
  • Work closely with ODM/OEMs and ecosystem partners to review designs, close gaps, guide implementation trade-offs, and deliver robust reference solutions and customer platforms on schedule.
  • Drive validation and conformance strategy for manageability and telemetry, including interoperability, Redfish/PLDM compliance, fault injection, service workflow validation, scale testing, and field debug methodology.
  • Influence future AMD Instinct platform roadmaps using insights from bring-up, partner integrations, deployment learnings, and telemetry-driven data.
  • Represent AMD in relevant standards and open-source communities, including DMTF and OpenBMC forums, and guide upstream/downstream strategy where appropriate.
  • Mentor engineers and architects across the organization and serve as the senior technical point of contact for rack-scale serviceability and telemetry.

PREFERRED EXPERIENCE
  • Expert level experiences in platform architecture, system management, BMC/embedded firmware, server manageability, or adjacent domains, including significant time in architect or technical leadership roles.
  • Proven experience defining serviceability/manageability architecture for servers, accelerators, storage, networking, or rack-scale infrastructure in datacenter, cloud, AI, or HPC environments.
  • Deep knowledge of DMTF Redfish, including schema design, OEM extension strategy, eventing, update service, and telemetry concepts such as MetricReportDefinition/Metric Reports; strong understanding of PLDM/MCTP for platform inventory, monitoring, control, and update workflows.
  • Strong hands-on experience with OpenBMC, including Yocto/OpenEmbedded, D-Bus, systemd, bmcweb/Redfish, phosphor services, firmware update flows, sensor frameworks, and log/event handling.
  • Experience with embedded Linux, ARM-based BMC SoCs, U-Boot, Linux kernel/device driver concepts, device tree, and low-level interfaces such as I2C/I3C, SPI, UART, GPIO, SMBus/PMBus, and related platform-management buses.
  • Strong understanding of server/platform RAS and serviceability features such as health monitoring, error logging, crashdump, diagnostics, inventory/FRU management, and remote recovery.
  • Experience with secure manageability architectures, including secure boot, root of trust, attestation, firmware signing, SPDM, and protection of out-of-band management paths.
  • Experience creating architecture specifications, product requirements, conformance plans, validation strategies, and design reviews that drive execution across multiple internal teams and external partners.
  • Strong programming and scripting background in C/C++, Python, and shell, with the ability to debug across firmware, hardware, and system software boundaries.
  • Experience with large-scale telemetry or observability pipelines, metrics consumers, or fleet operations tooling is strongly preferred.
  • Experience with AMD server or GPU platforms, AI/HPC system design, liquid cooling/power/thermal infrastructure, or OCP-aligned rack architectures is a plus.
  • Strong written and verbal communication skills with proven ability to influence senior engineering leadership, customers, and strategic partners.

ACADEMIC CREDENTIALS

Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field. Advanced degree preferred.

LOCATION

Austin, Texas preferred. Other AMD datacenter engineering locations may be considered based on team alignment and business needs.

This role is not eligible for visa sponsorship.

#LI-BW2

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

About Advanced Micro Devices, Inc

Advanced Micro Devices, Inc. Careers

Join the innovative forefront of technology with a career at Advanced Micro Devices, Inc. (AMD), a leader in semiconductor development. As part of our global team, you will contribute to an organization renowned for its dedication to innovation, leadership, and diversity in the tech industry.

Work You’ll Do

At AMD, we offer job opportunities that push the boundaries of what is possible. Our team is composed of professionals who lead the way in microprocessor and graphics technology, driving industry standards and innovation. With AMD, you will be part of a culture that values growth and professional development, ensuring that every team member has the opportunity to excel.

Transform Your Career

AMD is not just about advancing technology, but also about advancing careers. Whether you are looking for an internship, a full-time position, or leadership roles, AMD provides the platform to propel your career to new heights. Our commitment to professional growth is matched by our dedication to diversity and inclusion, making AMD a place where everyone can thrive.

Innovative Work Environment

Join a team of over 12,000 dedicated professionals at the intersection of technology, industry expertise, and digital innovation. At AMD, you will work on groundbreaking projects that shape the future of computing and graphics. Our collaborative environment encourages networking and the sharing of ideas across teams and disciplines.

Career Development and Benefits

AMD is committed to the development of its employees. We offer robust training programs, including leadership development and diversity training, to ensure our team is equipped for both current challenges and future opportunities. Our benefits package is designed to support the well-being and financial security of our employees and their families.

Explore Job Opportunities

From engineering to marketing, AMD offers a range of career paths that cater to diverse skills and interests. Our hiring process is designed to be transparent and engaging, helping you to understand where you fit within our team and how you can contribute to our collective goals.

Stay Connected

Join Our Team Search open positions that match your skills and interest. We look for passionate, curious, creative, and solution-driven team players. Explore the opportunities to join a company that’s committed to your career growth and to innovation in the technology sector.

Keep Up to Date

Stay ahead with career tips, insider perspectives, and industry-leading insights you can put to use today—all from the people who work here.

Job Alert Emails

Personalize your subscription to receive job alerts, latest news, and insider tips tailored to your preferences. Discover the exciting and rewarding career opportunities that await at Advanced Micro Devices, Inc.

Interview and Resume Tips

Prepare for your future with AMD by accessing resources that help you craft your resume and excel in interviews. Our goal is to help you showcase your best professional self and align your skills with the needs of our dynamic team. At Advanced Micro Devices, Inc., we empower our employees to innovate, lead, and grow. Join us in driving the future of technology while building a rewarding and sustainable career.
Learn more about Advanced Micro Devices, Inc
Size
15,500 employees
Market Cap
$100.9 billion
Industry
Net Income
$2.4 billion
Founded
1969
5 Year Trend
+30.9%
Revenue
$9.7 billion
NASDAQ

Similar Jobs

More Jobs at Advanced Micro Devices, Inc

More Information Technology Jobs

Find similar Rack Scale Serviceability & Telemetry Architect jobs: