Amazon

Principal Product Manager - AI/ML Training, Annapurna Labs

Amazon$208K — $281K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 7+ years as a Technical Product Manager
  • Bachelor's degree in computer science, engineering, or related field
  • Experience with large-scale model training workflows
  • Familiarity with major AI/ML training frameworks like JAX or PyTorch
  • Proven ability to drive product strategy and cross-organizational alignment
  • Strong written and verbal communication skills, including for executive audiences

Responsibilities

  • Define and execute training product strategy and roadmap based on customer needs
  • Drive strategy for post-training workflows including RLHF and fine-tuning
  • Engage with customers to understand their training challenges and requirements
  • Identify and implement improvements for Neuron's training AI/ML ecosystem tools
  • Lead product launches for training capabilities, ensuring all documentation is ready
  • Coordinate with cross-functional teams to align on product execution
  • Define success metrics and track adoption of training software

Benefits

  • Health insurance including medical, dental, and vision
  • 401(k) matching
  • Paid time off and parental leave
  • Mental health support and employee assistance programs
  • Flexible spending accounts and adoption reimbursement support
Full Job Description
AWS Trainium is deployed at scale, with millions of chips in production, used for training and inference of frontier models. AWS Neuron is the software stack for Trainium, enabling customers to run deep learning and generative AI workloads with optimal performance and cost efficiency.

AWS Neuron is hiring a Principal Technical Product Manager to define and drive product strategy for training software on Trainium. This includes distributed training libraries, post-training workflows (RLHF, DPO, fine-tuning), reinforcement learning frameworks, and training performance optimization. Your mission is to enable researchers and operators to train frontier models at scale on Trainium, from single-node experimentation to distributed training across thousands of nodes.

You will be the champion inside AWS for frontier model builders pushing the bounds of scale and resilience for current and emerging training paradigms. You will work with customers inside and outside the company to identify key improvements and stay ahead of the training landscape. You will define how Neuron supports the training AI/ML ecosystem and what tools customers will use for their training workflows on Trainium.

To be successful, you will partner with engineering teams building training libraries and distributed training infrastructure, applied scientists developing optimization techniques, and PMs responsible for compiler, runtime, NKI, and infrastructure. You will develop deep knowledge of AI/ML training architectures, distributed training systems, model parallelism strategies, and training performance optimization to effectively define product strategy and make informed technical decisions.

The Ideal Candidate

The ideal candidate will have solid understanding of large-scale model training, distributed training architectures, post-training workflows, and reinforcement learning. They should be able to assess technical implications of training software stack decisions, understand customer needs, and drive developer experience improvements. The ideal candidate can navigate ambiguity in a fast-moving, early-stage initiative, balance competing priorities across multiple workstreams, and drive alignment across engineering and science stakeholders with excellent written and verbal communication abilities

Key job responsibilities

Training Product Strategy & Roadmap

Define and execute training product strategy and roadmap working backwards from customer requirements in collaboration with engineering leadership. Define the vision for how customers train frontier models at scale on Trainium, balancing performance, developer experience, and AI/ML ecosystem compatibility. Produce PRFAQs and PRDs for training capabilities. Drive technical alignment across Neuron training libraries, distributed training infrastructure, and dependencies. Partner with PMs responsible for compiler, NKI, runtime, and infrastructure. Drive trade-offs between training performance, scalability, developer experience, and AI/ML ecosystem compatibility. Define requirements for reusable training building blocks that compose into end-to-end workflows.

Post-Training, RL & Emerging Workflows

Drive strategy for post-training workflows including RLHF, DPO, reward modeling, and fine-tuning at scale. Define requirements for how Neuron supports emerging training paradigms, model architectures, and RL-based optimization loops. Lead the product experience for RL research-to-production workflows on Trainium. Create and optimize RL libraries and frameworks to help researchers and production model builders.

Customer Engagement & Enablement

Work with BD, Solutions Architecture, and GTM teams to engage customers training frontier models on Trainium. Understand their distributed training challenges, RL needs, performance optimization requirements, and framework preferences. Translate customer pain points into product requirements. Define success metrics for training adoption and performance. Support customer enablement for training migration and optimization.

Training AI/ML Ecosystem & Delivery

Define how Neuron supports the training AI/ML ecosystem and what tools customers will use for their training workflows on Trainium. Own the technical depth on training-specific AI/ML ecosystem tools and define how Neuron's training libraries integrate with them. Track training-specific AI/ML ecosystem trends and feed them into product planning. Drive open source community engagement and upstream contributions for training-related tools. Coordinate with BD on partnership discussions where training-specific technical input is needed.

Launch & Go-to-Market

Lead end-to-end launches for training capabilities, coordinating documentation, field enablement, and customer communications. Partner with Marketing and Solutions Architecture to drive awareness and adoption. Define launch success criteria and track adoption metrics.

About the team

Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge sharing and mentorship. We operate with startup like velocity, prioritizing talent acquisition, hands on leadership, and flexible organization. Our senior members enjoy one on one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.

Diverse Experiences

AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn't followed a traditional path, or includes alternative experiences, don't let it stop you from applying.

Inclusive Team Culture

Here at AWS, it's in our nature to learn and be curious. Our employee led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness.

Work/Life Balance

We value work life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there's nothing we can't achieve in the cloud.

Mentorship & Career Growth

We're continuously raising our performance bar as we strive to become Earth's Best Employer. That's why you'll find endless knowledge sharing, mentorship and other career advancing resources here to help you develop into a better rounded professional.

About Amazon Annapurna Labs

Amazon Annapurna Labs team (our organization within AWS UC) is responsible for building innovation in silicon and software for our AWS customers. We are at the forefront of innovation by combining cloud scale with the world's most talented engineers. Our team covers multiple disciplines including silicon engineering, hardware design, software and operations. Because of our teams breadth of talent, we have been able to improve AWS cloud infrastructure in high performance machine learning with AWS Neuron, Inferentia and Trainium ML chips, in networking and security with products such as AWS Nitro, Enhanced Network Adapter (ENA), and Elastic Fabric Adapter (EFA), and in computing with AWS Graviton and F1 EC2 instances.

About AWS Utility Computing (UC)

AWS Utility Computing (UC) provides product innovations that continue to set AWS's services and features apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cloud computing offerings across the AWS portfolio.

About AWS

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating, that's why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.

BASIC QUALIFICATIONS

- 7+ years of working as a Technical Product Manager experience

- Bachelor's degree in computer science, engineering, analytics, mathematics, statistics, IT or equivalent

- Experience with large-scale model training workflows, including solid knowledge of distributed training concepts

- Familiarity with major AI/ML training frameworks (JAX or PyTorch) and how training libraries interact with them

- Experience driving product strategy, long-term roadmap development, and cross-organizational alignment

- Excellent written and verbal communication abilities, including executive-level communication

PREFERRED QUALIFICATIONS

- Experience with PyTorch or JAX distributed training

- Track record of driving developer training libraries and tools

- Experience with design and scaling of training optimization software (e.g., NeMo, TorchTitan, TRL, VeRL, MaxText, AXLearn, or similar)

- Experience leading RL for research-to-production at scale

- Experience with post-training workflows including RLHF, DPO, reward modeling, and fine-tuning

- Experience with AI/ML training accelerators and hardware, including training performance optimization, profiling, and tooling

- Experience with distributed training of large-scale models including model parallel training techniques (tensor, pipeline, sequence, and expert parallelism)

- Experience working on open source and GitHub-first developer products with deep customer interactions

- Track record of driving open standards and AI/ML ecosystem integration for training workflows

- Experience operating in early-stage, ambiguous environments with startup-like velocity

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, CA, Cupertino - 208,300.00 - 281,800.00 USD annually

USA, WA, SEATTLE - 181,100.00 - 245,000.00 USD annually

USA, WA, Seattle - 181,100.00 - 245,000.00 USD annually

About Amazon

Audible is a provider of spoken audio information and entertainment , on the Internet. They provide premium spoken audio content, such as audio versions of books and newspapers and radio programs, that is delivered over the Internet and played back on personal computers and hand-held electronic devices. The Audible service allows consumers to purchase and download their content from their Website, store it in digital files and play it back on personal computers and electronic devices. More than 15,000 hours of audio content are available on their Web site, including audio versions of books, periodicals and radio programs. Several manufacturers have agreed to support and promote the playback of their content on their hand-held audio-enabled electronic devices.

Amazon Careers

Joining Amazon presents an unparalleled opportunity to become part of a vibrant team pushing the boundaries of innovation and growth in the global marketplace. As a leader in e-commerce, technology, and logistics, Amazon offers a variety of job opportunities that cater to a range of skills and professional interests. Work You’ll Do At Amazon, every day is an opportunity to collaborate with the brightest minds in technology and business to redefine what’s possible. Whether you’re interested in software development, marketing, human resources, or customer service, Amazon has a position waiting for you. Transform the way the world shops and innovates with our diverse and inclusive team. Amazon is not just a company; it’s a community where you can drive real change and contribute to projects impacting millions globally. Lead with Innovation and Leadership Amazon is the perfect place to enhance your leadership and innovation skills. Our culture encourages pushing the envelope and imagining the unimaginable. Here, you will lead projects that challenge the status quo and define new industry standards. Work with a team that values diversity and is committed to creating an inclusive environment. Our leadership is focused on harnessing the collective power of unique perspectives to foster growth and innovation. Explore Amazon’s Employment Benefits Amazon’s commitment to its employees extends beyond just career growth. We offer competitive benefits, including health care, parental leave, and diversity training, ensuring that our team not only excels professionally but also enjoys well-being and security. Internship and Networking Opportunities Start your career with an Amazon internship and gain hands-on experience that matters. Our internships provide a gateway to full-time employment and an opportunity to network with professionals across various sectors of the company. Future-Proof Your Career With Amazon, your career path is filled with numerous opportunities for advancement. Our learning and development programs are designed to nurture your professional growth and keep you at the forefront of industry trends. Stay Connected Join Our Team Discover the job opportunities at Amazon that match your skills and interests. We are constantly on the lookout for passionate, curious, and innovative team players ready to make a difference. Keep Up to Date Stay ahead with career tips, insider perspectives, and industry-leading insights you can put to use today—all from the people who work here. Job Alert Emails Customize your subscription to receive job alerts, the latest news, and insider tips tailored to your preferences. Explore the exciting and rewarding career opportunities that await at Amazon. Amazon is more than just a company—it’s a platform for building a promising future. Whether you’re starting or looking to advance your career, Amazon offers the resources, support, and network you need to succeed. Join us, and be a part of our continuing mission to be Earth's most customer-centric company.
Learn more about Amazon
Size
1,608 employees
Market Cap
$832.6 billion
Industry
Net Income
$21.3 billion
Founded
1994
5 Year Trend
+28.1%
Revenue
$386 billion
NASDAQ

Similar Jobs

More Jobs at Amazon

More Information Technology Jobs

Find similar Principal Product Manager - AI/ML Training, Annapurna Labs jobs: