Proficiency in PyTorch and Triton with practical experience in developing stress workloads.
Strong understanding of computational memory management, DMA, and execution patterns.
Experience with performance analysis and optimization of both simulator and real hardware.
Ability to design scalable test harnesses for various workloads and configurations.
Familiarity with cross-functional collaboration in a technical environment.
Responsibilities
Design and implement high-intensity stress workloads using PyTorch and Triton.
Identify and troubleshoot performance issues and system bottlenecks in simulator and real setups.
Develop complex PyTorch workloads that push model-level execution limits.
Create custom Triton kernels to assess hardware performance under stress.
Document and streamline processes for integrating workloads into CI and monitoring tools.
Maintain and update a library of reusable PyTorch stress workloads.
Collaborate with firmware and SDK teams to address risk areas and refine stress tests.
Benefits
Opportunity to work on cutting-edge machine learning and hardware integration projects.
Collaborative work environment with cross-functional teams.
Access to advanced tools for performance testing and optimization.
Possibility for innovation in stress testing methodologies and changing the tech landscape.
Full Job Description
Role description
Design and implement highintensity stress workloads using PyTorch and Triton Exercise core MAIA execution paths including compute memory DMA and collectives
Enable early detection of performance cliffs stability issues and system bottlenecks across simulator and real hardware Improve platform maturity reduce latestage escapes and increase confidence for broader internal and external adoption
Develop PyTorch workloads stressing modellevel execution such as large GEMMs attention patterns MoElike behavior mixed precision and longrunning loops
Author custom Triton kernels to stress hardware execution units memory hierarchies and synchronization paths
Build parameterized stress harnesses scalable by problem size number of devices and runtime duration Integrate workloads with existing profiling monitoring and failure triage tooling
Collaborate with platform firmware and SDK teams to target known risk areas and emerging issues
Document usage patterns and provide reproducible scripts for lab and continuous integration CI usage
Develop and maintain a library of reusable PyTorch stress workloads
Create Tritonbased micro and macrokernels designed specifically for stress and saturation testing
Build and support test harnesses and scripts for singledevice and multidevice execution
Ensure workload designs align with platform risk areas and emerging hardwaresoftware issues
Collaborate crossfunctionally with platform firmware and SDK teams to refine stress tests
Provide comprehensive documentation describing workload intent configuration options and expected stress characteristics Support profiling monitoring and failure triage by integrating stress workloads with existing tools
Deliver reproducible and scalable testing solutions for lab and CI environments