Advanced Micro Devices, Inc

Systems Design Engineer - AI Cluster Software

Advanced Micro Devices, Inc$120K — $150K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's or Master's degree in electrical or computer engineering.
  • 5-7 years of hands-on experience in engineering roles focusing on AI and HPC.
  • Strong background in end-to-end systems thinking and troubleshooting.
  • Familiarity with multiple schedulers and orchestration systems, such as Kubernetes and Slurm.
  • Proficiency in Linux fundamentals, including networking and performance analysis tools.
  • Experience creating clear and structured documentation for complex systems.

Responsibilities

  • Shape AI infrastructure by developing reference architectures, configuration guides, and deployment blueprints.
  • Conduct thorough evaluations of AI stacks across compute, storage, networking, and observability layers.
  • Design and execute reproducible experiments to benchmark various technologies like schedulers and training libraries.
  • Create small reference implementations and tools for validating performance hypotheses.
  • Build a library of technical artifacts to support pre-sales engineers and facilitate learning.
  • Present findings via demos and documentation, while creating templates for repeatable evaluations.

Benefits

  • AMD benefits at a glance.
  • Flexible working arrangements with a hybrid work model.
  • Opportunities for continuous learning and professional development.
  • Health and wellness programs including fitness initiatives.
  • Access to a network of industry experts and cutting-edge technologies.
Full Job Description
THE ROLE:

This is a hands-on role for engineers who thrive on exploration, love solving complex systems problems, and are passionate about AI, HPC, and large-scale infrastructure. You'll bring your expertise to a software-focused team that investigates AI infrastructure across compute, storage, networking, and orchestration layers. Your work and knowledge will help shape reference architectures, configuration guides, and reproducible experiments that support internal teams, pre-sales engineers, and customers in making informed hardware and software decisions.

Our team operates across industry verticals as subject matter experts in the AI stack and across the cluster. We're building a library of technical artifacts such as design docs, run books, and "how it works" guides to help others inside and outside AMD deploy, manage, and scale AMD-based AI infrastructure. This is a high-autonomy role focused on creation, not operations. If you enjoy building, learning, debugging tough issues, and writing about what you discover, we want to hear from you!

THE PERSON:

You're an engineer, a systems thinker and professional troubleshooter who sees the big picture and thrives on researching and experimentation. You have hands-on experience with rack- and row-scale performant infrastructure and are eager to explore how AI workloads like inferencing and training fit into large-scale AI infrastructure. You're not looking for a runbook, you're looking to build the blueprint.

You're self-directed, proactive, and comfortable navigating ambiguity to solve complex problems. You communicate clearly, enjoy writing technical artifacts that help others understand intricate systems, and collaborate naturally with internal teams and customers. You get excited to teach others what you know. Whether you're diving into a new stack or refining a reference architecture, you bring curiosity, initiative, and a drive to create.

KEY RESPONSIBILITIES:
  • Apply your expertise to shape AI infrastructure by creating reference architectures, configuration guides, and deployment blueprints that help internal teams and customers make informed hardware and software decisions.
  • Perform deep technical evaluations of AI stacks across compute, storage, networking, and observability layers, documenting how they work, where they fit, and the tradeoffs involved.
  • Design and execute reproducible experiments and benchmarking harnesses to compare technologies such as schedulers, distributed training libraries, and observability stacks.
  • Develop small reference implementations and tools to validate performance hypotheses, analyze system behavior and more.
  • Build a library of technical artifacts-including presentations, design documents, and "how it works" guides, to support pre-sales engineers and enable others to skill up from an HPC perspective.
  • Present findings through demos, documentation, and internal talks, and create templates and checklists to support repeatable evaluations and cluster designs.


PREFERRED EXPERIENCE:
  • Engineering mindset: Evidence of end-to-end systems thinking, debugging, and tradeoff decisions.
  • AI/HPC cluster background: hands-on familiarity with at least two schedulers and/or orchestration systems (e.g., Slurm, Kubernetes), MPI/OpenMP, distributed storage patterns, or performance analysis.
  • Comparative analysis: experience writing evaluation docs/RFCs with clear criteria, benchmarks, risks, and recommendations.
  • Strong Linux fundamentals: Linux operating systems, networking, filesystems, containers, performance tooling (perf, flamegraphs, nvprof/rocprof, basic eBPF).
  • Clear communication: ability to turn complex systems into accessible, structured documentation with diagrams and reproducible steps.
  • AMD ecosystem experience: ROCm, RCCL, Instinct GPUs, EPYC platforms, compiler/toolchain impacts, and performance tuning.
  • Distributed training internals: DDP, collective comms, sharded/stateful optimizers; NCCL/RCCL behavior and transport considerations (PCIe, NVLink, IF).
  • Orchestration models: Slurm configuration patterns, Kubernetes for HPC/AI (GPU operators, device plugins), Apptainer/Singularity.
  • Storage/data: parallel filesystems (Lustre, BeeGFS), object stores, RDMA, data pipeline throughput and caching strategies.
  • IaC literacy: Terraform/Ansible for reproducible blueprints-focused on design and sample configs, not running prod clusters.
  • Documentation tooling: reproducible docs/workbooks, literate programming notebooks, CI for benchmarks.

ACADEMIC CREDENTIALS:

Bachelors or Masters degree in electrical or computer engineering

LOCATIONS:

Austin, Texas
Seattle, Washington

Santa Clara, California

Secaucus, New Jersey

Markham, Canada

This role is not eligible for visa sponsorship.

#LI-CB1

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

About Advanced Micro Devices, Inc

Advanced Micro Devices, Inc. Careers

Join the innovative forefront of technology with a career at Advanced Micro Devices, Inc. (AMD), a leader in semiconductor development. As part of our global team, you will contribute to an organization renowned for its dedication to innovation, leadership, and diversity in the tech industry.

Work You’ll Do

At AMD, we offer job opportunities that push the boundaries of what is possible. Our team is composed of professionals who lead the way in microprocessor and graphics technology, driving industry standards and innovation. With AMD, you will be part of a culture that values growth and professional development, ensuring that every team member has the opportunity to excel.

Transform Your Career

AMD is not just about advancing technology, but also about advancing careers. Whether you are looking for an internship, a full-time position, or leadership roles, AMD provides the platform to propel your career to new heights. Our commitment to professional growth is matched by our dedication to diversity and inclusion, making AMD a place where everyone can thrive.

Innovative Work Environment

Join a team of over 12,000 dedicated professionals at the intersection of technology, industry expertise, and digital innovation. At AMD, you will work on groundbreaking projects that shape the future of computing and graphics. Our collaborative environment encourages networking and the sharing of ideas across teams and disciplines.

Career Development and Benefits

AMD is committed to the development of its employees. We offer robust training programs, including leadership development and diversity training, to ensure our team is equipped for both current challenges and future opportunities. Our benefits package is designed to support the well-being and financial security of our employees and their families.

Explore Job Opportunities

From engineering to marketing, AMD offers a range of career paths that cater to diverse skills and interests. Our hiring process is designed to be transparent and engaging, helping you to understand where you fit within our team and how you can contribute to our collective goals.

Stay Connected

Join Our Team Search open positions that match your skills and interest. We look for passionate, curious, creative, and solution-driven team players. Explore the opportunities to join a company that’s committed to your career growth and to innovation in the technology sector.

Keep Up to Date

Stay ahead with career tips, insider perspectives, and industry-leading insights you can put to use today—all from the people who work here.

Job Alert Emails

Personalize your subscription to receive job alerts, latest news, and insider tips tailored to your preferences. Discover the exciting and rewarding career opportunities that await at Advanced Micro Devices, Inc.

Interview and Resume Tips

Prepare for your future with AMD by accessing resources that help you craft your resume and excel in interviews. Our goal is to help you showcase your best professional self and align your skills with the needs of our dynamic team. At Advanced Micro Devices, Inc., we empower our employees to innovate, lead, and grow. Join us in driving the future of technology while building a rewarding and sustainable career.
Learn more about Advanced Micro Devices, Inc
Size
15,500 employees
Market Cap
$100.9 billion
Industry
Net Income
$2.4 billion
Founded
1969
5 Year Trend
+30.9%
Revenue
$9.7 billion
NASDAQ

Similar Jobs

More Jobs at Advanced Micro Devices, Inc

More Information Technology Jobs

Find similar Systems Design Engineer - AI Cluster Software jobs: