Responsibilities
Explore, co-design and optimize parallelisms, compute efficiency, distributed training/inference paradigms and algorithms to improve the scalability, efficiency, and reliability of GenAI systems
• Innovate and co-design novel model deployment techniques for sustained scaling and hardware efficiency during GenAI serving
• Benchmark, analyze, model, and project the performance of AI workloads against a wide range of what-if scenarios and provide early input to the design of future hardware, models, and runtime, giving crucial feedback to the architecture, compiler, kernel, modeling, and runtime teams
• Explore, prototype and productionize highly optimized ML kernels to unlock full potential of current and future accelerators for Meta's AI workloads
• Influence the hardware roadmap of Meta's custom AI accelerators
• Lead cross-functional initiatives spanning multiple engineering organizations to drive high-impact technical milestones
• Guide Meta's AI HW requirements and design focusing on performance at System and Silicon levels. Co-design and optimize our AI HW and related software stack for Meta's future workloads, with technology pathfinding and evaluation of cutting-edge AI systems
Minimum Qualifications
• Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
• PhD in Computer Science, Electrical Engineering, Applied Mathematics, or a related technical field, OR a Master's degree with 3+ years of relevant industry experience
• Proven research experience in one or more of the following areas: hardware-aware model enablement, performance modeling of AI systems or prevailing accelerators/silicon architectures
• Hands-on proficiency with end-to-end AI hardware architecture or on-device mapping algorithm development, encompassing logic, architecture, and optimizations for performance, power, and area (Power, Performance, and Area) (PPA)
• Theoretical background and practical experience with AI models (e.g., CNNs, Transformers, LLMs, Diffusion models)
• Experience in system-level performance analysis, profiling, and benchmarking of AI workloads
• In-depth experience of Python and experience with at least one major AI framework
• Track record of publishing research papers at peer-reviewed conferences or journals, and experience communicating technical results to cross-functional stakeholders
Preferred Qualifications
• Experience with deploying AI agents/prevalining techniques for increased efficiency
• Experience or knowledge of training/inference of large-scale deep learning models
• Familiarity with low-level programming for specialized hardware (e.g., CUDA, HIP, Triton) or hardware description languages (HDL)
• Experience or knowledge of distributed ML systems and algorithm development
• Experience or knowledge of either Generative AI models such as LLMs/LDMs or Ranking & Recommendation models such as DLRM or equivalent