About this role:
Well Fargo is seeking a highly skilled and forward-thinking Lead Software Engineer to join our API SRE & Operations team within CTO Platform Services team. This role is ideal for someone passionate about building scalable, resilient, and intelligent infrastructure solutions. You will play a key role in driving automation, reducing operational toil, and enabling self-service capabilities through cutting-edge technologies including Generative AI and Agent development.
In this role, you will:
• Lead complex, large‑scale Systems Operations initiatives and provide high‑level technical consultation
• Drive daily production support for Apigee OPDK and Apigee Hybrid, ensuring platform uptime, stability, and performance
• Manage and maintain core Apigee components (Routers, MPs, MART, Zookeeper, Cassandra, Postgres, runtime infrastructure)
• Lead operational support for IBM DataPower Gateways, including firmware upgrades, domain and service configurations
• Own and resolve P1/P2/P3 high‑severity incidents, including deep technical troubleshooting and rapid mitigation
• Perform detailed Root Cause Analysis (RCA) and drive permanent corrective actions
• Lead communication and coordination during major incidents across cross‑functional teams
• Act as the primary technical liaison between Support, Engineering, Cloud, Network, Security, and Architecture teams
• Support API proxy deployments, shared flows, developer portals, and runtime troubleshooting
• Plan and execute platform upgrades, patching, migrations, and configuration refactoring
• Implement automation and reliability improvements using Ansible, Python, Shell scripting, and IaC
• Establish and maintain observability using tools like Splunk, Grafana, Prometheus, Dynatrace/AppDynamics
• Improve proactive monitoring and alerting to reduce MTTD and MTTR
• Participate in architectural reviews, platform modernization, and API governance initiatives
Required Qualifications
- 5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
- 3+ years supporting Apigee or similar API Management platforms in production environments
- 3+ years supporting Red Hat Enterprise Linux and Kubernetes, with strong experience in OpenShift (OCP)
- 3+ years of experience with Automation & Scripting: Expertise in Ansible Tower, including developing and maintaining playbooks
Desired Qualifications
- 1+ years of experience with cloud-native architectures, high-availability systems, Cloud & Container Technologies like GCP or Azure and familiarity with Kubernetes
- Experience leveraging observability platforms such as Splunk, ELK, Grafana, Prometheus, AppDynamics
- Experience with Site Reliability Engineering (SRE) practices and production‑grade systems
- Experience with cloud‑native architectures and container platforms (GCP or Azure)
- Strong experience with automation and scripting (Ansible Tower, Python, Unix/Shell)
- Experience implementing Infrastructure‑as‑Code (Terraform, GitOps)
- Experience working in Agile / Scrum environments
- Proven ability to lead cross‑functional initiatives and influence stakeholders
- Strong problem‑solving, communication, and collaboration skills
Job Expectations:
- This position offers a hybrid work schedule
- Must be available for on-call support
- Must have flexibility to work ad-hoc shifts when required
Posting End Date:
9 Jun 2026
*Job posting may come down early due to volume of applicants.