info_outline
X In accordance with Washington state law, we are highlighting our comprehensive benefits package, which is available to all eligible US based employees. Benefits for this role include:
- Health, dental, vision, life, disability insurance
- Retirement Benefits: 401(k) with company match
- Paid Time Off: 20 days of vacation per year, accruing at a rate of 6.15 hours per pay period for the first five years of employment
- Sick Time: 40 hours/year (increased to 69 hours/year for Seattle) including 5 discretionary sick days per instance
- Maternity Leave (Short-Term Disability Baby Bonding): 28-30 weeks
- Baby Bonding Leave: 18 weeks
- Holidays: 13 paid days per year
Note: By applying to this position you will have an opportunity to share your preferred working location from the following:
Sunnyvale, CA, USA; Kirkland, WA, USA; New York, NY, USA.
Minimum qualifications:- Bachelor's degree or equivalent practical experience.
- 8 years of experience in software development.
- 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
- Experience with modern GPU architectures (NVIDIA, AMD, or other AI accelerators), memory hierarchies, and performance bottlenecks.
- Experience with modern LLMs and their deployment on AI accelerators.
- Experience with low-level GPU programming (CUDA, Triton, CUTLASS, etc.) and performance engineering techniques.
Preferred qualifications:- Master's degree or PhD in Engineering, Computer Science, or a related technical field.
- 8 years of experience with data structures and algorithms.
- 3 years of experience in a technical leadership role leading project teams and setting technical direction.
- 3 years of experience working in a structured organization involving cross-functional, or cross-business projects.
- Experience with compiler optimization, code generation, and runtime systems for GPU architectures (OpenXLA, MLIR, Triton, etc.).
Responsibilities - Identify and maintain LLM training and serving benchmarks, using them to identify performance opportunities, drive XLA:GPU/Triton performance toward XLA releases.
- Engage with various teams, like DeepMind, to solve challenging ML model performance problems.
- Run architecture-level simulations on GPU designs and perform roofline analysis to guide partner teams.
- Analyze performance and efficiency metrics to identify bottlenecks and then design and implement solutions at Google fleet-wide scale.
- Run performance benchmarks on GPU hardware using internal and external tools such as TRT-LLM, vLLM , and SGLang.