Amazon

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon$151K — $261K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in computer science or equivalent.
  • 5+ years of professional software development experience.
  • 5+ years of programming experience in at least one language.
  • 5+ years of leading design or architecture of systems.
  • 5+ years of experience in full software development life cycle.
  • Experience as a mentor or tech lead.
  • Experience in machine learning and large scale training with LLMs.

Responsibilities

  • Lead efforts to build distributed training support into PyTorch.
  • Enable distributed training strategies for optimization of models.
  • Maximize efficiency on AWS custom silicon, including Trainium servers.
  • Collaborate with chip architects, compiler engineers, and runtime engineers.
  • Develop and tune performance solutions for various ML model families.

Benefits

  • Flexibility in the working culture to support work-life balance.
  • Mentorship opportunities from senior members.
  • Access to career-advancing resources and knowledge-sharing.
  • Inclusive culture promoting diversity and belonging.
  • Employee-led affinity groups and ongoing events for learning and growth.
Full Job Description
AWS Neuron is the complete software stack for the AWS Trainium (Trn1/Trn2) and Inferentia (Inf1/Inf2) our cloud-scale Machine Learning accelerators. This role is for a Senior Machine Learning Engineer in the Distribute Training team for AWS Neuron, responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive-scale Large Language Models (LLM) such as GPT-OSS, Quen and Llama, as well as Stable Diffusion, Vision Transformers (ViT) and many more.

The ML Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience with training these large models using Pythorch is a must. Distributed training with awareness of strategies like FSDP (Fully-Sharded Data Parallel), PP, Context parallel. Distributed training libraries like torchtitan, torchtune , HF RL , DeepSeek etc are central to this and extending all of this for the Neuron based system is key focussing on enabling large scale training. Experience is post-training strategies like DPO/PPO/HF torch-tune will additional strength and aligns with team success.

Key job responsibilities

You will lead efforts to build distributed training support into PyTorch, the Neuron compiler, and runtime stacks. You will enable distribute training strategies as well as use them to optimize models to achieve peak performance and maximize efficiency on AWS custom silicon, including Trainium servers. Strong software development skills, the ability to deep dive, work effectively within cross-functional teams, and a solid foundation in Machine Learning are critical for success in this role.

BASIC QUALIFICATIONS

- Bachelor's degree in computer science or equivalent

- 5+ years of non-internship professional software development experience

- 5+ years of programming with at least one software programming language experience

- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience

- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience

- Experience as a mentor, tech lead or leading an engineering team

- Experience in machine learning, large scale training with LLMs and expertise in Pytorch.

PREFERRED QUALIFICATIONS

- Master's degree in computer science or equivalent

- Experience in computer architecture

- Previous software engineering expertise with Pytorch/Jax/Tensorflow, Distributed libraries and Frameworks, End-to-end Model Training.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $151,300/year in our lowest geographic market up to $261,500/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.

About Amazon

Audible is a provider of spoken audio information and entertainment , on the Internet. They provide premium spoken audio content, such as audio versions of books and newspapers and radio programs, that is delivered over the Internet and played back on personal computers and hand-held electronic devices. The Audible service allows consumers to purchase and download their content from their Website, store it in digital files and play it back on personal computers and electronic devices. More than 15,000 hours of audio content are available on their Web site, including audio versions of books, periodicals and radio programs. Several manufacturers have agreed to support and promote the playback of their content on their hand-held audio-enabled electronic devices.

Amazon Careers

Joining Amazon presents an unparalleled opportunity to become part of a vibrant team pushing the boundaries of innovation and growth in the global marketplace. As a leader in e-commerce, technology, and logistics, Amazon offers a variety of job opportunities that cater to a range of skills and professional interests. Work You’ll Do At Amazon, every day is an opportunity to collaborate with the brightest minds in technology and business to redefine what’s possible. Whether you’re interested in software development, marketing, human resources, or customer service, Amazon has a position waiting for you. Transform the way the world shops and innovates with our diverse and inclusive team. Amazon is not just a company; it’s a community where you can drive real change and contribute to projects impacting millions globally. Lead with Innovation and Leadership Amazon is the perfect place to enhance your leadership and innovation skills. Our culture encourages pushing the envelope and imagining the unimaginable. Here, you will lead projects that challenge the status quo and define new industry standards. Work with a team that values diversity and is committed to creating an inclusive environment. Our leadership is focused on harnessing the collective power of unique perspectives to foster growth and innovation. Explore Amazon’s Employment Benefits Amazon’s commitment to its employees extends beyond just career growth. We offer competitive benefits, including health care, parental leave, and diversity training, ensuring that our team not only excels professionally but also enjoys well-being and security. Internship and Networking Opportunities Start your career with an Amazon internship and gain hands-on experience that matters. Our internships provide a gateway to full-time employment and an opportunity to network with professionals across various sectors of the company. Future-Proof Your Career With Amazon, your career path is filled with numerous opportunities for advancement. Our learning and development programs are designed to nurture your professional growth and keep you at the forefront of industry trends. Stay Connected Join Our Team Discover the job opportunities at Amazon that match your skills and interests. We are constantly on the lookout for passionate, curious, and innovative team players ready to make a difference. Keep Up to Date Stay ahead with career tips, insider perspectives, and industry-leading insights you can put to use today—all from the people who work here. Job Alert Emails Customize your subscription to receive job alerts, the latest news, and insider tips tailored to your preferences. Explore the exciting and rewarding career opportunities that await at Amazon. Amazon is more than just a company—it’s a platform for building a promising future. Whether you’re starting or looking to advance your career, Amazon offers the resources, support, and network you need to succeed. Join us, and be a part of our continuing mission to be Earth's most customer-centric company.
Learn more about Amazon
Size
1,608 employees
Market Cap
$832.6 billion
Industry
Net Income
$21.3 billion
Founded
1994
5 Year Trend
+28.1%
Revenue
$386 billion
NASDAQ

Similar Jobs

More Jobs at Amazon

More Information Technology Jobs

Find similar Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training jobs: