Do you like building software systems that power the world's largest cloud network? Would you like to play a key role in developing the tools, automation, and data infrastructure that keep AWS network interconnects running at peak performance?
The Core Networking team is looking for a Software Development Engineer II to join our Network Product Development (NPD) Interconnects Tools and Metrics (TMX) organization. In this role, you will design, develop, and operate the software systems that enable the NPD Interconnects team to monitor, qualify, and manage interconnect products across the AWS fleet. You will build and maintain tooling, automation, and data infrastructure across test infrastructure, observability and analytics, distributed systems for link operations, and ML model delivery. This role requires strong software engineering skills, the ability to navigate ambiguity across multiple technical domains, and a passion for building scalable, reliable systems that directly impact network availability for AWS customers.
Key job responsibilities
- Design, develop, deploy, and operate software systems that enable the NPD Interconnects team to monitor, qualify, and manage interconnect products across the AWS fleet
- Develop and maintain automated test frameworks and tooling that enable product engineers to validate optical transceivers and fiber connectivity products, scaling test infrastructure to support increasing qualification demands
- Build and maintain data ingestion, processing, and storage systems for optics and fiber telemetry data, enabling product owners to conduct fleet-wide analysis through self-service tooling and dashboards
- Design and deliver distributed systems that orchestrate link-level testing, validation, and troubleshooting workflows across AWS regions, ensuring resilience and scalability
- Collaborate with Applied Scientists to build ML/science model serving infrastructure, operationalizing models that optimize fleet performance and predict failures
- Independently clarify requirements and deliver system-level solutions for technically complex or operationally ambiguous problems, with guidance from senior engineers on architectural direction
- Participate in on-call rotation, lead troubleshooting of production issues, and drive resolution for both individual and large-scale fleet events
- Automate and simplify team operations processes, improving service resilience and performance
- Produce high-quality, well-tested code, actively participate in code reviews, and mentor newer team members to raise the engineering bar
- Communicate effectively about technical work, document system architecture and operations, and collaborate across team boundaries to deliver features in services owned by other organizations
A day in the life
On an everyday basis as part of our team, you have the unique opportunity to understand the growing AWS network and our internal customers' requirement on interconnect solutions. You'll work backwards to devise hardware solutions by influencing the broad industry and/or to develop software tools with sister teams to maintain a highly available network that delights AWS customers. You design and implement processes and mechanisms that both help the team to deliver business impact to the organization in a systemic way, while also helping to raise the bar on our operational excellence.
Operating at the scale we do, there is no blueprint for how to do what we do, which encourages our engineers to identify and develop simple solutions to complex problems. We encourage durable solutions that look around corners while taking into consideration our customer needs from a cost, performance, and reliability perspective. We work closely with our internal partners that design, build and operate the network to ensure that our solutions meet their needs and exceed their expectations.
About the team
Within AWS Networking the NPD (Network Product Development) organization is responsible for, designing the hardware, building the software, and owning the interconnects for the routers that power the global AWS network. Beyond product delivery we actively manage the fleet or routers in a network that grows by 70% annually. This means tracking key business and operational metrics to ensure that we operate smoothly and minimize or eliminate customer impact due to device related issues for a transparent AWS customer experience.
BASIC QUALIFICATIONS
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- 1+ years of software development engineer or related occupational experience
- 1+ years of designing and developing large-scale, multi-tiered, multi-threaded, embedded or distributed software applications, tools, systems, and services using: C#, C++, Java, or Perl experience
- 1+ years of Object Oriented Design experience
- Bachelor's degree or foreign equivalent in Computer Science, Engineering, Mathematics, or a related field
- Experience programming with at least one software programming language
PREFERRED QUALIFICATIONS
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience building data pipelines or automated ETL processes
- Experience designing, building, operating, and managing large-scale distributed systems or web services
- Experience with AWS services including S3, Redshift, Sagemaker, EMR, Kinesis, Lambda, and EC2
- Experience managing and deploying ML products
- Experience in managing and troublshooting network, or experience with automation and any version control tools and experience in networking administration and troubleshooting
- Experience working cross-functionally with tech and non-tech teams, including operations
- Knowledge of network cabling, optic types, and test equipment
- Master's degree in Computer Science, Computer Engineering, or related fields
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, CA, Cupertino - 165,200.00 - 223,600.00 USD annually