Amazon

Software Development Manager, AWS Neuron SDK - Distributed Training

Amazon$212K — $287K *
Enterprise Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Knowledge of object-oriented design, data structures, and algorithms
  • 7+ years in software development, non-internship
  • 3+ years in engineering team management
  • 8+ years in defining multi-tier web services
  • Deep expertise in Distributed Training on thousands of nodes

Responsibilities

  • Design and optimize the ML stack for distributed training on Neuron devices
  • Scale and optimize applications for LLMs across multi-modal input/output
  • Support development of multi-modal transformer models such as MM-Llama3.2 and CLIP
  • Ensure high performance and efficiency of models on AWS Trainium TRN2+ servers
  • Lead efforts for stability support in frameworks like Pytorch and Jax using Neuron tools

Benefits

  • Opportunity to work hands-on on cutting-edge machine learning technologies
  • Collaborative environment with a strong team of engineers
  • Engagement with multi-modal AI systems and large-scale deployments
  • Involvement in the full development life cycle
  • Direct impact on delivering solutions to customers
Full Job Description
Job description

AWS Neuron is a software stack for the Annapurna Inferentia and Trainium machine

learning accelerators hosted inside AWS EC2 Trn1/2 and Inf1 servers.

As the Principal Engineer for the Neuron Distributed Training team, you will be responsible for working hands-on with a strong team of engineers to help design and optimize ML on Neuron devices. Specifically focus on bringing up a coherent solution across the stack to increase the training resiliency for ultra clusters with thousands of nodes. You will Scale and Optimize the application stack for LLMs that leverage multi-modal modes of input/output-generation such as Text, Vision, Video, Audio etc. You will be responsible for the full development life cycle of providing Distributed Training support for multi-modal transformer models such as MM-Llama3.2, DiT/Pixart, CLIP etc. You will develop scalability features and performance optimizations in the Neuron ML Framework components to enable them make Trainium devices as the first-class citizens for ML Acceleration. Lead the way to ensure support for key ML functionality in a combined chip / software platform. Ensure the right thing is being built and delivered to customers

A successful candidate will have an established background in Scaling and Stabilizing Machine Learning Distributed Training components along-with a strong technical ability to work/deliver on a vertically integrated system stack that consists of a combinatorial matrix of hardware, frameworks, and workflows. Deep expertise in scaling model training across thousands of nodes a must along-with direct customer-facing experience and a strong motivation to achieve results.

Key job responsibilities

This role will help lead the efforts building distributed training large cluster stability support into Pytorch, Jax using XLA and the Neuron compiler and runtime stacks. This role will help tune these models to ensure highest performance and maximize the efficiency of them running on the customer AWS Trainium TRN2+ servers. Strong software development and ML knowledge are both critical to this role.

Additional details for internal candidates

This role needs ML/DL work experience, with focus on GenAI and LLMs.

Basic qualifications

- Knowledge of object-oriented design, data structures, and algorithms

- Experience (non-internship) in professional software development

Preferred qualifications

- Experience designing and building large-scale systems in a multi-tiered, distributed environment (Service Oriented Architecture)

- Experience in Distributed Training on thousands of nodes.

BASIC QUALIFICATIONS

- 3+ years of engineering team management experience

- 7+ years of working directly within engineering teams experience

- 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience

- 8+ years of leading the definition and development of multi tier web services experience

- Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations

- Experience partnering with product or program management teams

PREFERRED QUALIFICATIONS

- Experience in communicating with users, other technical teams, and senior leadership to collect requirements, describe software product features, technical designs, and product strategy

- Experience in recruiting, hiring, mentoring/coaching and managing teams of Software Engineers to improve their skills, and make them more effective, product software engineers

USA, CA, Cupertino - 212,700.00 - 287,700.00 USD annually

About Amazon

Audible is a provider of spoken audio information and entertainment , on the Internet. They provide premium spoken audio content, such as audio versions of books and newspapers and radio programs, that is delivered over the Internet and played back on personal computers and hand-held electronic devices. The Audible service allows consumers to purchase and download their content from their Website, store it in digital files and play it back on personal computers and electronic devices. More than 15,000 hours of audio content are available on their Web site, including audio versions of books, periodicals and radio programs. Several manufacturers have agreed to support and promote the playback of their content on their hand-held audio-enabled electronic devices.

Amazon Careers

Joining Amazon presents an unparalleled opportunity to become part of a vibrant team pushing the boundaries of innovation and growth in the global marketplace. As a leader in e-commerce, technology, and logistics, Amazon offers a variety of job opportunities that cater to a range of skills and professional interests. Work You’ll Do At Amazon, every day is an opportunity to collaborate with the brightest minds in technology and business to redefine what’s possible. Whether you’re interested in software development, marketing, human resources, or customer service, Amazon has a position waiting for you. Transform the way the world shops and innovates with our diverse and inclusive team. Amazon is not just a company; it’s a community where you can drive real change and contribute to projects impacting millions globally. Lead with Innovation and Leadership Amazon is the perfect place to enhance your leadership and innovation skills. Our culture encourages pushing the envelope and imagining the unimaginable. Here, you will lead projects that challenge the status quo and define new industry standards. Work with a team that values diversity and is committed to creating an inclusive environment. Our leadership is focused on harnessing the collective power of unique perspectives to foster growth and innovation. Explore Amazon’s Employment Benefits Amazon’s commitment to its employees extends beyond just career growth. We offer competitive benefits, including health care, parental leave, and diversity training, ensuring that our team not only excels professionally but also enjoys well-being and security. Internship and Networking Opportunities Start your career with an Amazon internship and gain hands-on experience that matters. Our internships provide a gateway to full-time employment and an opportunity to network with professionals across various sectors of the company. Future-Proof Your Career With Amazon, your career path is filled with numerous opportunities for advancement. Our learning and development programs are designed to nurture your professional growth and keep you at the forefront of industry trends. Stay Connected Join Our Team Discover the job opportunities at Amazon that match your skills and interests. We are constantly on the lookout for passionate, curious, and innovative team players ready to make a difference. Keep Up to Date Stay ahead with career tips, insider perspectives, and industry-leading insights you can put to use today—all from the people who work here. Job Alert Emails Customize your subscription to receive job alerts, the latest news, and insider tips tailored to your preferences. Explore the exciting and rewarding career opportunities that await at Amazon. Amazon is more than just a company—it’s a platform for building a promising future. Whether you’re starting or looking to advance your career, Amazon offers the resources, support, and network you need to succeed. Join us, and be a part of our continuing mission to be Earth's most customer-centric company.
Learn more about Amazon
Size
1,608 employees
Market Cap
$832.6 billion
Industry
Net Income
$21.3 billion
Founded
1994
5 Year Trend
+28.1%
Revenue
$386 billion
NASDAQ

Similar Jobs

More Jobs at Amazon

More Enterprise Technology Jobs

Find similar Software Development Manager, AWS Neuron SDK - Distributed Training jobs: