Position Summary...The Enterprise Storage Platform team, part of Walmart Global Tech's Enterprise and Cloud organization, designs, engineers, and operates the next-generation storage infrastructure that powers Walmart's mission-critical applications, data platforms, cloud services, and Kubernetes workloads at multi-petabyte scale.
We are hiring a Staff System & Infrastructure Engineer to help define and evolve Walmart's enterprise storage strategy across block, file, object, and cloud-native storage platforms. In this role, you will serve as a senior technical leader responsible for designing resilient storage architectures, driving platform modernization, improving reliability and performance, and enabling self-service storage capabilities for engineering teams across Walmart.
This role is ideal for an engineer who combines deep enterprise storage expertise with modern platform engineering practices. You will work across on-premises, cloud, and Kubernetes environments, using automation, Infrastructure as Code, observability, and AI/AIOps-driven operational capabilities to simplify storage consumption, reduce operational toil, and improve infrastructure efficiency at massive scale.
You will partner closely with cloud, compute, networking, database, security, and application engineering teams to support Walmart's largest business-critical workloads while continuously advancing scalability, resiliency, utilization, performance, and cost optimization across the enterprise storage ecosystem.
What you'll do...Enterprise Storage Architecture and Engineering
- Lead the design, architecture, and evolution of enterprise-scale storage platforms across block, file, object, and cloud-native storage services.
- Define storage platform strategy, reference architectures, technology roadmaps, and multi-year modernization initiatives aligned to business growth and platform needs.
- Design highly available, resilient, scalable, and secure storage solutions supporting mission-critical workloads across on-premises, cloud, and Kubernetes environments.
- Establish data-driven models to balance performance, resiliency, scalability, utilization, and cost across enterprise storage platforms.
- Develop AI-powered remediation workflows and operational copilots that improve incident diagnosis, root cause analysis, and infrastructure recovery.
- Lead root cause analysis for critical production incidents and implement long-term architectural improvements to prevent recurrence.
- Evaluate emerging storage technologies and provide technical leadership for adoption, migration, and platform transformation initiatives.
- Partner with application, database, cloud, infrastructure, security, and platform engineering teams to design optimized storage solutions for modern workloads.
- Define disaster recovery, business continuity, backup, replication, cyber-resiliency, and data protection architectures for enterprise storage platforms.
- Drive storage efficiency through automation, intelligent tiering, deduplication, compression, lifecycle policies, and capacity optimization.
Platform Automation and Engineering
- Architect and develop self-service storage platforms that allow application and infrastructure teams to provision, manage, and consume storage through APIs and automated workflows.
- Build scalable automation frameworks for provisioning, lifecycle management, replication, snapshots, backup orchestration, compliance validation, and infrastructure remediation.
- Design and implement Infrastructure as Code solutions using tools such as Terraform, Ansible, Argo CD, GitOps, Helm, and CI/CD pipelines.
- Develop Python-based automation services, SDK integrations, REST API solutions, and workflow orchestration capabilities that reduce manual work and improve platform reliability.
- Champion engineering practices that reduce operational toil and advance autonomous platform capabilities.
- Drive platform observability by designing telemetry, monitoring, alerting, logging, dashboarding, and operational analytics solutions.
Technical Leadership and Strategy
- Provide technical leadership and mentorship to engineers across storage, cloud, infrastructure, and platform engineering organizations.
- Lead architecture reviews, design governance, technology evaluations, and engineering best-practice adoption.
- Influence strategic infrastructure investments through data-driven recommendations, technical assessments, and business case development.
- Lead cross-functional initiatives across storage, compute, networking, cloud, security, database, and application engineering teams.
- Contribute to engineering standards, operational maturity models, platform reliability practices, and long-term infrastructure strategy.
Security, Compliance, and Governance
- Ensure enterprise storage platforms meet Walmart's security, compliance, governance, auditability, and data protection requirements.
- Integrate compliance controls, policy enforcement, and automated validation into platform workflows.
- Partner with security and compliance teams to strengthen platform posture, improve resilience, and reduce operational risk.
What You'll Bring- 10+ years of experience designing, engineering, and operating enterprise storage platforms in large-scale enterprise environments.
- Deep hands-on experience with enterprise storage platforms such as NetApp, Pure Storage, VAST Data, Cohesity, Dell PowerScale, or similar technologies.
- Strong architectural understanding of SAN, NAS, object storage, and distributed storage systems, including FC, iSCSI, NVMe-oF, NFS, SMB, and S3-based platforms.
- Proven experience supporting multi-petabyte storage environments with demanding availability, scalability, resiliency, and performance requirements.
- Strong expertise in storage architecture, performance engineering, workload characterization, capacity planning, and troubleshooting complex production infrastructure issues.
- Deep understanding of data protection technologies, including snapshots, replication, backup, disaster recovery, business continuity, and cyber-resiliency architectures.
- Hands-on experience deploying, operating, and optimizing Portworx or similar Kubernetes-native storage platforms in large-scale production environments.
- Strong understanding of Kubernetes storage architecture, including CSI, PV/PVC lifecycle management, Storage Classes, Volume Snapshots, StatefulSets, and automated storage provisioning workflows.
- Experience with container platforms and cloud-native application architectures running on Kubernetes.
- Expert-level Python programming skills, including experience developing automation frameworks, platform services, REST API integrations, SDK-based tooling, and workflow orchestration solutions.
- Strong experience implementing Infrastructure as Code and GitOps practices using Terraform, Ansible, Argo CD, Helm, CI/CD pipelines, and version-controlled deployments.
- Experience designing and operating observability solutions using tools such as Prometheus, Grafana, OpenTelemetry, logging platforms, and operational analytics frameworks.
- Knowledge of AI/ML, AIOps, or advanced analytics for anomaly detection, predictive operations, intelligent alerting, automated remediation, and operational optimization.
- Understanding of infrastructure cost optimization, storage consumption models, utilization analysis, capacity management, and cost-performance tradeoff optimization.
- Strong communication, stakeholder management, and technical leadership skills, with the ability to influence architecture decisions and drive alignment across engineering organizations.
- Demonstrated ability to lead large-scale platform modernization, technology transformation, and operational excellence initiatives.
Minimum Qualifications...Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, information systems, information technology, or related area and 4 years'experience in technology infrastructure engineering across areas such as compute, storage, network, mobility or virtualization-related technologies.
Option 2: 6 years' experience in technology infrastructure engineering across areas such as compute, storage, network, mobility or virtualization relatedtechnologies.
Preferred Qualifications...Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
Master's degree in computer science, computer engineering, information systems, information technology, or related area and 2 Years of relevant experience in configuration management, developing automation for Provisioning and Orchestration, experience developing and managing service-level objectives (SLOs)/service level indicators (SLIs) for Infrastructure Service availability and performance, We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart's accessibility standards and guidelines for supporting an inclusive culture.
Primary Location...1345 Crossman Ave, Sunnyvale, CA 94089-1114, United States of America