The application window is expected to close on: 08/28/2026
Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.
Meet the Team The Hyperscaler Routing team within Cisco Networking focuses on delivering advanced routing solutions tailored for hyperscale environments. The team leverages Cisco Silicon One technology, which converges routing and switching silicon to provide high capacity, power efficiency, and programmability. This enables hyperscalers to architect distributed AI environments with seamless scalability and security. Their work supports cutting-edge AI infrastructure and cloud-scale networking, addressing the demands of AI traffic and next-generation hyperscale routing systems.
Your ImpactAs a key contributor to Cisco's AI/ML infrastructure initiatives, you will plan, execute, and analyze comprehensive benchmarks on Cisco switches, focusing on throughput, latency, congestion, incast, failover, path diversity, and workload performance to ensure optimal AI/ML network operations.
You will be guiding AI/ML workload deployments from initial scoping and test planning through execution and benchmark analysis, ensuring success criteria are met. Your role includes developing AI-driven automation workflows to streamline network development, operations, and implementations.
You will define rigorous benchmark methodologies, test plans, KPIs, pass/fail criteria, and reporting structures for AI RoCE Ethernet fabrics, benchmarking fabric performance across critical metrics including latency, throughput, path diversity, ECMP and link utilization, congestion behavior, packet drops, retransmissions, queue occupancy, and recovery behavior. You will run and analyze performance tests using industry-standard tools such as NCCL, RCCL, ib_write_bw, ib_read_bw, ib_send_bw, ib_write_lat, netperf, iperf, MPI, OSU benchmarks, and microburst test methods.
You will validate switch ASIC features including buffers, schedulers, QoS/queuing, ECMP behavior, telemetry, hashing, traffic distribution, and congestion visibility.
Owning switch OS configuration and automation, you will utilize SONiC, NX-OS, Ansible, Python, Bash, Git, and related tooling to implement and validate advanced features such as SRv6, segment routing, uSID, Adj-SID, and policy-based pathing as required. You will document PoC architecture, benchmark methodologies, topology diagrams, configurations, results, findings, and recommendations.
This role empowers you to shape the future of AI infrastructure networking by delivering scalable, high-performance, and resilient network fabrics that meet the stringent demands of AI/ML workloads, driving innovation and customer success at Cisco.
Minimum Qualifications- Bachelors + 7 years of related experience, or Masters + 4 years of related experience.
- Python for automation experience.
- Experience with L2/L3 network protocols such as BGP, OSPF, EVPN, VxLAN, IPv6 or similar.
- Experience with Traffic tools such as Spirent, IXIA or similar.
- Docker or Kubernetes experience.
- Experience with network testing and validation.
Preferred Qualifications- Clear written and verbal communication skills as well as documentation skills.
- SONiC, NxOS, Linux or other open source network operating systems experience.
- Deep understanding of Leaf-spine fabric and troubleshooting them.
- Experience with Cisco Nexus Dashboard and related automation tools for provisioning, managing and troubleshooting the fabric.
- Experience handling complex network segmentation, security policies, and multi-site fabric designs.
- Experience with RDMA, RoCEv2, PFC, ECN, congestion control, QoS, buffer behavior, and lossless Ethernet concepts.