Advanced Micro Devices, Inc

Senior AI Cluster Hardware Engineer

Advanced Micro Devices, Inc$120K — $160K *
Telecommunications & Hardware
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in GPU cluster optimization and performance tuning
  • Strong background in GPU architectures and parallel computing
  • Hands-on experience with debugging hardware, firmware, and drivers
  • Proficiency in scripting languages like Python or Bash for performance analysis
  • Understanding of RDMA network drivers and performance tuning
  • Experience with system-level performance analysis tools
  • Excellent problem-solving skills with an analytical mindset

Responsibilities

  • Evaluate scalability of GPU clusters through comprehensive testing
  • Utilize profiling tools to identify performance bottlenecks
  • Implement optimization strategies for improved GPU cluster performance
  • Create detailed documentation of performance analyses and tuning outcomes
  • Collaborate with cross-functional teams to integrate performance improvements
  • Enhance performance focusing on RDMA throughput and latency
  • Develop benchmarking strategies to assess performance baselines

Benefits

  • Comprehensive benefits package
  • Opportunities for continuous career development
  • Support for technical innovation and hands-on problem solving
  • Access to resources for staying ahead of industry trends
  • Engagement in cross-functional teamwork for impactful results
Full Job Description
THE ROLE:

We are seeking a highly motivated and skilled GPU Cluster Network Performance Attainment Engineer to join our dynamic team. In this role, you will be at the forefront of optimizing and achieving peak performance for GPU clusters. The focus of this role is the RDMA networks used in AI Clusters, understanding data flows between GPU, NIC and cluster network. The ideal candidate will have a strong background in GPU architectures, parallel computing, and hands-on experience in system level performance tuning and debug methodologies.

THE PERSON:

The team fosters and encourages continuous technical innovation to showcase successes as well as facilitate continuous career development. A seasoned professional who enjoys hands-on problem-solving. In this role, you'll shape long-term strategy and jump in to tackle challenges head-on. You'll have a direct impact on performance, automation, and development, while staying ahead of industry trends to provide strategic insights to senior management. The person should be experienced in debugging complex HW/FW and drivers.

KEY RESPONSIBILITIES:
  • Scalability Testing: Evaluate the scalability of GPU clusters by conducting thorough testing under various workloads, ensuring optimal performance across different cluster sizes, configurations, and networking technologies (RoCE & IB)
  • Performance Profiling: Utilize profiling tools and methodologies to analyze and identify performance bottlenecks, providing actionable insights for improvement.
  • Performance Tuning: Implement optimization strategies, including but not limited to protocol enhancements, load balancing techniques, and parallel processing optimizations.
  • Documentation: Create detailed documentation of performance analysis, tuning efforts, and outcomes, providing clear and concise reports for internal teams and stakeholders.
  • Collaboration: Work closely with cross-functional teams, including hardware engineers, software developers, and system architects, to integrate performance improvements into the GPU cluster architecture.
  • NIC & Performance Optimization: Collaborate with hardware and software teams to enhance the overall performance of GPU clusters, focusing on aspects such as RDMA throughput, latency, and collective communications.
  • Benchmarking and Analysis: Develop and execute comprehensive benchmarking strategies to assess baseline performance, analyze bottlenecks, and identify areas for improvement within GPU cluster environments.
  • Continuous Learning: Stay current with the latest developments in GPU architectures, parallel processing, and emerging technologies to drive continuous improvement in GPU cluster performance.

PREFERRED EXPERIENCE:
  • Proven experience in optimizing the performance of GPU clusters.
  • Understanding of RDMA network drivers
  • Strong understanding of GPU architectures, parallel computing concepts, and network protocols.
  • Proficiency in scripting languages (e.g., Python, Bash) for automation and performance analysis.
  • Experience with system level performance analysis tools and methodologies for GPU clusters.
  • Analytical mindset with excellent problem-solving and debug skills.
  • Familiarity with cluster management tools and systems.
  • Excellent communication and collaboration skills for effective teamwork.
  • RDMA network configuration, troubleshooting and performance tuning.
  • Linux kernel networking expertise
  • Machine learning and/or HPC system design

ACADEMIC CREDENTIALS:
  • Bachelors or Masters degree in electrical or computer engineering preferred


LOCATION: Austin, TX

This role is not eligible for visa sponsorship.

#LI-SC3

#LI-HYBRID

Benefits offered are described: AMD benefits at a glance.

About Advanced Micro Devices, Inc

Advanced Micro Devices, Inc. Careers

Join the innovative forefront of technology with a career at Advanced Micro Devices, Inc. (AMD), a leader in semiconductor development. As part of our global team, you will contribute to an organization renowned for its dedication to innovation, leadership, and diversity in the tech industry.

Work You’ll Do

At AMD, we offer job opportunities that push the boundaries of what is possible. Our team is composed of professionals who lead the way in microprocessor and graphics technology, driving industry standards and innovation. With AMD, you will be part of a culture that values growth and professional development, ensuring that every team member has the opportunity to excel.

Transform Your Career

AMD is not just about advancing technology, but also about advancing careers. Whether you are looking for an internship, a full-time position, or leadership roles, AMD provides the platform to propel your career to new heights. Our commitment to professional growth is matched by our dedication to diversity and inclusion, making AMD a place where everyone can thrive.

Innovative Work Environment

Join a team of over 12,000 dedicated professionals at the intersection of technology, industry expertise, and digital innovation. At AMD, you will work on groundbreaking projects that shape the future of computing and graphics. Our collaborative environment encourages networking and the sharing of ideas across teams and disciplines.

Career Development and Benefits

AMD is committed to the development of its employees. We offer robust training programs, including leadership development and diversity training, to ensure our team is equipped for both current challenges and future opportunities. Our benefits package is designed to support the well-being and financial security of our employees and their families.

Explore Job Opportunities

From engineering to marketing, AMD offers a range of career paths that cater to diverse skills and interests. Our hiring process is designed to be transparent and engaging, helping you to understand where you fit within our team and how you can contribute to our collective goals.

Stay Connected

Join Our Team Search open positions that match your skills and interest. We look for passionate, curious, creative, and solution-driven team players. Explore the opportunities to join a company that’s committed to your career growth and to innovation in the technology sector.

Keep Up to Date

Stay ahead with career tips, insider perspectives, and industry-leading insights you can put to use today—all from the people who work here.

Job Alert Emails

Personalize your subscription to receive job alerts, latest news, and insider tips tailored to your preferences. Discover the exciting and rewarding career opportunities that await at Advanced Micro Devices, Inc.

Interview and Resume Tips

Prepare for your future with AMD by accessing resources that help you craft your resume and excel in interviews. Our goal is to help you showcase your best professional self and align your skills with the needs of our dynamic team. At Advanced Micro Devices, Inc., we empower our employees to innovate, lead, and grow. Join us in driving the future of technology while building a rewarding and sustainable career.
Learn more about Advanced Micro Devices, Inc
Size
15,500 employees
Market Cap
$100.9 billion
Industry
Net Income
$2.4 billion
Founded
1969
5 Year Trend
+30.9%
Revenue
$9.7 billion
NASDAQ

Similar Jobs

More Jobs at Advanced Micro Devices, Inc

More Telecommunications & Hardware Jobs

Find similar Senior AI Cluster Hardware Engineer jobs: