THE ROLE: This software engineer role will help drive AMD's strategy, architecture, optimization and tooling to achieve industry-leading AI Pre-training and Distributed Inference Performance on AMD GPU. You will partner across hardware architecture, AI frameworks, compilers, runtime, ROCm, developer tools and models to scale performance analysis and optimization.
As an Engineer of Collectives and Network performance, you will drive the end-to-end technical performance attainment across the entire software stack focusing on getting the best performance on multiple generations of AMD GPUs with a wide range of models including latest state-of-the-art AI models. You will help set the strategy and roadmap for general optimization, accelerating supporting new models and out of box performance.
If you are passionate about performance optimization, getting the best out of the hardware, and shaping the future of AI acceleration, then this role is for you.
THE PERSON: The ideal candidate will have deep knowledge with Network, NIC and GPU hardware architecture, software optimization, performance modeling, AI frameworks and latest trend in inference and training optimization. Hand-on experience in mapping model architecture to low level software, hardware and understanding the impact of each layer of the stack on model performance. Strong knowledge in latest generative model architecture, especially SoTA models, distributed inference and deployment at scale is crucial.
KEY RESPONSIBILITIES: - Help with strategy and roadmap for AMD Collectives and Network optimizations.
- Provide guidelines to customers on efficient network load-balancing, workload scheduling and model sharding strategies.
- Performance tuning, profiling and analysis of large-scale models for LLM, diffusion, multimodal, RecSys and generative AI, single node and distributed. In addition to exploring various tradeoffs and design decisions.
- Participate in hardware-software co-design for future hardware optimizations - especially on scale-up networks, NIC and scale-out networks.
- Develop and improve framework, tools and infrastructure for performance estimation, modeling and reporting.
- Communicate and present the results of the performance analysis and modeling to stakeholders, and senior leadership. And provide a concrete recommendation.
- Cross team collaboration and working across the organization to identify opportunities and develop strategies.
PREFERRED EXPERIENCE: - Multiple years of technical experience in performance optimization.
- Strong technical expertise and experience in performance analysis, projection, and network hardware architecture.
- Deep knowledge and hand-on experience of AI Frameworks such as PyTorch, JAX, vLLM, and SGLang.
- Strong technical leadership skills, ability to work collaboratively with cross-functional teams.
- Mentor, coach, and inspire a diverse and talented team of researchers and engineers.
- Excellent written, verbal, and presentation skills, ability to coordinate internally and externally.
ACADEMIC CREDENTIALS: - A PhD or master's degree in computer science, electrical engineering, or a related field.
LOCATION:San Jose, CA (hybrid)
#LI-MV1
Benefits offered are described: AMD benefits at a glance.