About the RoleGruve is seeking a highly skilled
Senior Data Center Network Architect with 5+ years of experience designing, implementing, and optimizing modern data center networks supporting AI workloads. This is a deeply technical, customer-facing role aligned with Gruve's AI-driven infrastructure solutions.
The ideal candidate will have strong expertise in Cisco data center ecosystems, advanced AI network topology design, and high-performance fabrics that support large-scale training and inference environments.
Key Responsibilities- Lead architecture, design, and deployment of AI-optimized data center networks, including:
- Converged-rail, rail-optimized, and Dragonfly topologies
- High-performance fabrics supporting RDMA, RoCEv2, and GPU-dense cluster interconnects
- Networks supporting AI training and inference clusters
- Architect and implement Cisco-centric data center solutions (Nexus, ACI, UCS, NX-OS, etc.)
- Design scalable routing architectures using BGP, OSPF, EVPN, and related protocols
- Provide expert-level troubleshooting and network performance optimization
- Serve as a customer-facing technical leader-translating requirements into architectures
- Collaborate with AI Solutions, Platform Engineering, and customer technical leadership
- Create high-quality design documents, diagrams, and technical recommendations
Basic Qualifications- 5+ years in enterprise or hyperscale data center networking
- Hands-on experience designing and implementing:
- AI network topologies: Dragonfly, converged-rail, and rail-optimized fabrics
- RDMA, RoCEv2, congestion control, lossless Ethernet
- Spine-leaf architectures, 100/200/400/800G fabric design
- Cisco data center solutions (Nexus series, ACI, NDFC/NDO, UCS networking)
- Deep understanding of:
- BGP, OSPF, multihoming, overlay networks, VXLAN/EVPN
- Network QoS, buffering, scheduling, ECMP, telemetry
- Hands-on experience with network troubleshooting tools (packet capture, latency analysis)
- CCNP Data Center or equivalent certification
Preferred Qualifications- Experience with AI/ML workload patterns, GPU cluster behavior, and model training pipelines
- Automation experience: Python, Ansible, Terraform, Cisco DCNM/NDFC automation
- Familiarity with monitoring/observability tools: Prometheus, Grafana, ThousandEyes, Cisco Nexus Dashboard Insights
- Experience with multisite architectures, inter-DC fabrics, and L3 stretch patterns
- Exposure to cloud networking (AWS, Azure, GCP) aligned to hybrid AI environments
- Linux systems experience (kernel networking, tuning, NIC offloads)
- Experience with HPC, InfiniBand, or large-scale distributed training platforms
- Familiarity with AI accelerators (NVIDIA H100, B-series GPUs) and cluster topology tuning
- Experience writing SOWs, proposals, or participating in pre-sales architectures
- Excellent communication-able to explain complex concepts to executives and engineers
- Strong customer-facing presence: workshops, architecture reviews, executive briefings
- Ability to lead architectural decisions and drive technical consensus
- Certifications: CCIE Data Center or equivalent highly preferred
Salary Range $190,000 - $225,000 + Benefits
This is a full-time opportunity with Gruve, offering the flexibility to work from any of Gruve's office locations: Redwood City, California; Plano, Dallas; or Edison, New Jersey. Please note that this is an on-site role.