Amazon

Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams

Amazon$157K — $212K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in electrical engineering, computer engineering, or equivalent
  • 5+ years of professional experience in hardware design and validation
  • Experience with server technologies (thermal, mechanical, power) and signal integrity
  • Strong English communication skills, both written and verbal
  • Knowledge of operating systems, storage, network, security, and cloud infrastructure
  • Experience developing functional specifications and test procedures

Responsibilities

  • Own NPI for storage/accelerator server platforms from design to launch
  • Lead technical solutions for complex architectural challenges
  • Collaborate with ODMs to validate and manufacture servers at scale
  • Drive qualification milestones ensuring performance and reliability
  • Implement predictive failure detection systems and drive zero-touch operations
  • Perform root cause analysis of complex system failures
  • Break down server system problems into manageable tasks

Benefits

  • Comprehensive health insurance (medical, dental, vision)
  • 401(k) matching and retirement savings plans
  • Paid time off and parental leave
  • Flexible Spending Accounts and adoption reimbursement
  • Mental health support and medical advice line available
Full Job Description
As a Cloud Hardware Development Engineer, you will be an end-to-end owner of storage and/or accelerator (AI/ML/GPU) server platforms - from New Product Introduction (NPI) through fleet health in production. You own the full lifecycle: design, development, qualification, launch, and ongoing operational excellence of servers running at scale in the AWS fleet.

You will work closely with internal customers to understand their technical needs and business goals, leveraging your experience with server design and the knowledge of various teams to architect solutions we deploy at scale. To deliver your products, you will work with an interdisciplinary team of component, firmware, power, mechanical, electrical, test, qualification, manufacturing engineers, and lead our ODM (design and manufacturing partners) to bring these servers to the data center. After launch, you own the fleet - monitoring quality, driving reliability improvements, and ensuring servers continue to meet customer requirements throughout their

operational life.

This role demands deep technical curiosity and the willingness to jump in and personally solve the hardest problems. When a complex system failure occurs - whether during NPI qualification or in a production fleet of hundreds of thousands of servers - you roll up your sleeves, dive into the details across hardware, firmware, software, and physical layers, and drive to root cause. You don't wait for someone else to figure it out.

You will own end-to-end system reliability - proactively identifying deficiencies and driving toward zero-touch operations where automation detects, diagnoses, and resolves issues before customer impact. You will decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features, leading delivery yourself and through others in parallel.

This is a fast-paced, intellectually challenging position. You'll work with thought leaders in multiple technology areas, hold high standards for yourself and everyone you work with, and constantly look for ways to improve your products' performance, quality, and cost. We're changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today.

Key job responsibilities

NPI - New Product Introduction

- Own the end-to-end NPI lifecycle for storage and/or accelerator (AI/ML/GPU) server platforms - from architecture definition through design, qualification, manufacturing ramp, and launch

- Lead technical solutions for complex server and rack system architectural challenges

- Work with ODM/manufacturing partners to develop, validate, and manufacture server products at scale

- Develop functional specifications, design verification plans, and test procedures

- Drive qualification and readiness milestones, ensuring new platforms meet performance, reliability, and cost targets before fleet deployment

- Identify and resolve technical risks early in the development cycle - don't let problems reach production

Fleet Health, Diagnostics & Automation

- Own fleet health for the server platforms you launch - reliability doesn't end at ship

- Design and implement predictive failure detection systems using telemetry, sensor data, error trending, and log correlation to identify hardware issues before they cause customer impact

- Drive toward zero-touch operations - help build detection, diagnoses, and remediation of faults without human intervention

- Debug complex system failures in time-sensitive settings - personally diving deep when the problem demands it

- Perform root cause analysis correlating across firmware, kernel, driver, thermal, power, and physical layers

Systems Design & Technical Depth

- Apply expertise across hardware, software, system design, x86 architecture, processes, and operations (compute, storage, network, GPU)

- Design and implement solutions to address system-level issues at large scale

- Decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features

- Collaborate with hardware, software, manufacturing, supply chain, and product management teams

Cross-Team Collaboration

- Work closely with internal customers to ensure new server hardware meets data path and control path requirements

- Identify early any potential problems onboarding new servers into customer ecosystems

- Collaborate across Hardware Engineering, component, firmware, test, qualification, and integration teams

- Partner with datacenter operations to close the loop between field failures and design improvements

A day in the life

Your day-to-day responsibilities include interfacing with internal and external customers to understand product requirements and facilitate system development on top of your server designs. You will learn operational challenges facing our existing fleet with the goal of improving the current customer experience and developing improved systems for future designs. You will work directly with vendors and ODM (manufacture partners) to scale your product. Some days you're reviewing a new platform design with your ODM; other days you're deep in logs and telemetry data chasing a failure mode across the fleet. You thrive

on that range.

BASIC QUALIFICATIONS

- Experience in developing functional specifications, design verification plans and functional test procedures

- Bachelor's degree or above in electrical engineering, computer engineering, or equivalent

- Experience in English-language communication skills, both written and verbal

- Experience with design & innovation and research & development

- Knowledge of operating systems, hardware, storage, network, security, database administration and cloud infrastructure

- Experience in server technologies such as, thermal, mechanical, power, and signal integrity

- 5+ years of professional work (non-internship) experience

PREFERRED QUALIFICATIONS

- 5+ years of hardware design and validation of components, subsystems and systems experience

- Experience in server technologies: board design, high-speed bus design and signal integrity, failure analysis, server components (CPU, GPU, SSDs, memory), BIOS, BMC, and networking

- Experience developing and executing test procedures for mechanical or electrical systems/components

- Experience working with ODMs/manufacturer through the product development and manufacturing lifecycle

- Experience building predictive failure detection or proactive remediation systems at fleet scale

- Experience with storage/compute/GPU/accelerator platforms including integration, diagnostics, or performance validation

- Familiarity with PCIe topology, NVLink, NVMe, and accelerator interconnects

- Experience with large-scale datacenter or cloud environments

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, CA, Cupertino - 157,300.00 - 212,800.00 USD annually

USA, WA, Seattle - 136,000.00 - 184,000.00 USD annually

About Amazon

Audible is a provider of spoken audio information and entertainment , on the Internet. They provide premium spoken audio content, such as audio versions of books and newspapers and radio programs, that is delivered over the Internet and played back on personal computers and hand-held electronic devices. The Audible service allows consumers to purchase and download their content from their Website, store it in digital files and play it back on personal computers and electronic devices. More than 15,000 hours of audio content are available on their Web site, including audio versions of books, periodicals and radio programs. Several manufacturers have agreed to support and promote the playback of their content on their hand-held audio-enabled electronic devices.

Amazon Careers

Joining Amazon presents an unparalleled opportunity to become part of a vibrant team pushing the boundaries of innovation and growth in the global marketplace. As a leader in e-commerce, technology, and logistics, Amazon offers a variety of job opportunities that cater to a range of skills and professional interests. Work You’ll Do At Amazon, every day is an opportunity to collaborate with the brightest minds in technology and business to redefine what’s possible. Whether you’re interested in software development, marketing, human resources, or customer service, Amazon has a position waiting for you. Transform the way the world shops and innovates with our diverse and inclusive team. Amazon is not just a company; it’s a community where you can drive real change and contribute to projects impacting millions globally. Lead with Innovation and Leadership Amazon is the perfect place to enhance your leadership and innovation skills. Our culture encourages pushing the envelope and imagining the unimaginable. Here, you will lead projects that challenge the status quo and define new industry standards. Work with a team that values diversity and is committed to creating an inclusive environment. Our leadership is focused on harnessing the collective power of unique perspectives to foster growth and innovation. Explore Amazon’s Employment Benefits Amazon’s commitment to its employees extends beyond just career growth. We offer competitive benefits, including health care, parental leave, and diversity training, ensuring that our team not only excels professionally but also enjoys well-being and security. Internship and Networking Opportunities Start your career with an Amazon internship and gain hands-on experience that matters. Our internships provide a gateway to full-time employment and an opportunity to network with professionals across various sectors of the company. Future-Proof Your Career With Amazon, your career path is filled with numerous opportunities for advancement. Our learning and development programs are designed to nurture your professional growth and keep you at the forefront of industry trends. Stay Connected Join Our Team Discover the job opportunities at Amazon that match your skills and interests. We are constantly on the lookout for passionate, curious, and innovative team players ready to make a difference. Keep Up to Date Stay ahead with career tips, insider perspectives, and industry-leading insights you can put to use today—all from the people who work here. Job Alert Emails Customize your subscription to receive job alerts, the latest news, and insider tips tailored to your preferences. Explore the exciting and rewarding career opportunities that await at Amazon. Amazon is more than just a company—it’s a platform for building a promising future. Whether you’re starting or looking to advance your career, Amazon offers the resources, support, and network you need to succeed. Join us, and be a part of our continuing mission to be Earth's most customer-centric company.
Learn more about Amazon
Size
1,608 employees
Market Cap
$832.6 billion
Industry
Net Income
$21.3 billion
Founded
1994
5 Year Trend
+28.1%
Revenue
$386 billion
NASDAQ

Similar Jobs

More Jobs at Amazon

More Information Technology Jobs

Find similar Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams jobs: