About this role:
The Lead Infrastructure Engineer is responsible for designing, building, and operating highly scalable, resilient infrastructure Production platforms that support enterprise Generative AI and Predictive AI workloads. This role provides technical leadership across GPU-accelerated environments, OpenShift/Kubernetes platforms, and advanced AI infrastructure patterns, including large AI factory scale GPU compute architectures. The engineer partners closely with platform, application, and vendor teams to ensure secure, performant, and production-grade AI solutions.
In this role, you will:
- Lead complex initiatives to develop infrastructure to provide solutions for business applications
- Participate in various projects intended to continually improve or upgrade the infrastructure
- Evaluate internal and external software solutions which could be leveraged to meet target state architecture goals
- Review and analyze high impact outages to ensure the proper processes and procedures are in place to avoid problems in the future
- Design, build, deploy and maintain infrastructure solutions through collaborative efforts with the team and third party vendors
- Design, code, test, debug and document programs using Agile development practices
- Make decisions in technical designs, implementation plans and identify project risks and resource requirements
- Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success
- Recommend courses of action to maintain cost effectiveness and achieve results
- Collaborate and consult with peers, colleagues and managers to resolve issues and achieve goals
- Interact with customer and vendor
Required Qualifications:
- 5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
- 5+ years troubleshooting complex end-to-end architectures (including CI/CD pipeline)
- 5+ years Linux systems experience
- 4+ years supporting AI/ML platforms
- 4+ years of Kubernetes / container platform experience including production support
Desired Qualifications:
- Experience with Generative AI and Predictive AI platforms.
- Hands-on GPU platform operations including scheduling, quota, and performance tuning.
- Experience with OpenShift in GPU-enabled, multi-tenant environments.
- Experience designing or operating GPU SuperPods.
- Deep experience with observability using Grafana, Splunk, and custom telemetry pipelines.
- Experience building AI- or agent-driven automation tooling (AIOps).
- Hands-on experience supporting AI/ML workloads on GCP and Azure, including GPU-backed services and managed AI infrastructure
- Experience operating hybrid or multi-cloud AI platforms, with an understanding of cloud-native services, networking, identity, and cost optimization for Generative and Predictive AI
- Strong monitoring of AI signals such as inference latency and GPU utilization.
- Experience with BCP/DR, resiliency, and highly available architectures.
Job Expectations:
- This position offers a hybrid work schedule
- This position is not eligible for Visa sponsorship
- Participation in a 24x7 on-call rotation
Job Location:
300 S. Brevard - Charlotte, North Carolina 28202
401 Las Colinas Blvd W Building A - Irving, Texas 75039
2600 S Price Rd - Chandler, Arizona 85286
194 Wood Avenue South - Iselin, New Jersey
333 Market St - San Francisco, California 94105
Pay Range
Reflected is the base pay range offered for this position. Pay may vary depending on factors including but not limited to demonstrated examples of prior performance, skills, experience, or work location. Employees may also be eligible for incentive opportunities.
$119,000.00 - $224,000.00
Benefits
Wells Fargo provides eligible employees with a comprehensive set of benefits, many of which are listed below. Visit for an overview of the following benefit plans and programs offered to employees.
- Health benefits
- 401(k) Plan
- Paid time off
- Disability benefits
- Life insurance, critical illness insurance, and accident insurance
- Parental leave
- Critical caregiving leave
- Discounts and savings
- Commuter benefits
- Tuition reimbursement
- Scholarships for dependent children
- Adoption reimbursement
Posting End Date:
25 Jun 2026
*Job posting may come down early due to volume of applicants.