JP Morgan Chase & Co.

Senior Lead Site Reliability Engineer

JP Morgan Chase & Co.$130K — $160K *
Plano, TX 75025In-Person
Information Technology
15+ years of experience
Job Overview by Ladders

Qualifications

  • 16+ years of software engineering experience, 5+ years in Site Reliability Engineering (SRE)
  • Advanced knowledge of SRE culture and principles, with hands-on application in platforms
  • 2+ years of experience with AI/ML platforms and tech stacks like Databricks and GPU clusters
  • Skilled in defining SLOs/SLIs specifically for AI/ML workloads
  • Expertise in using Agentic AI frameworks for automating core SRE functions
  • Proficient in observability tools (e.g., Grafana, Datadog) for monitoring and telemetry
  • Cloud architecture knowledge with AWS, Snowflake, and Kubernetes preferred

Responsibilities

  • Create high-quality designs and roadmaps for engineering solutions
  • Mentor engineers, acting as a key resource for technical advice
  • Champion site reliability principles within the team
  • Collaborate on observability designs for complex systems
  • Identify analytics for better service level objectives
  • Design resilient self-healing patterns
  • Implement automated solutions for upgrades and change management
  • Evolve and debug critical application components
  • Provide tools and solutions to support company growth
  • Contribute actively to JPMorgan Chase’s site reliability community

Benefits

  • Supportive mentorship environment for continuous learning and growth
  • Opportunity to lead projects in a dynamic team
  • Engagement with a diverse set of perspectives and problem solvers
  • Participation in internal forums and conferences to contribute to community
  • Access to advanced technologies in AI and ML for cutting-edge solutions
Full Job Description
JOB DESCRIPTION

As a Sr Lead Site Reliability Engineer at JPMorgan Chase within theConsumer & Community Banking Data and Analytics team,you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation. You27ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment you27ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. A

Job responsibilities

  • Creates high quality designs, roadmaps, and program charters that are delivered by you or the engineers under your guidance
  • Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues
  • Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team
  • Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt
  • Identify application patterns and analytics in support of better service level objectives
  • Design self-healing and resiliency patterns
  • Design automated software and product upgrades, change management, and release management solutions
  • Works toward becoming an expert on the applications and platforms in your remit while understanding their interdependencies and limitations
  • Evolves and debug critical components of applications and platforms
  • Uses enterprise-authorized AI capabilities within the work environment to accelerate reliability design and operational decisioning (e.g., incident/post-incident analysis and requirements traceability), validating outputs and handling operational data according to sensitivity and security requirements.

  • Leads reuse-first adoption of AI-assisted reliability workflows across SDLC/toolchain practices (e.g., testing/validation automation and production readiness), ensuring traceability/auditability, resiliency, and security controls.

Required qualifications, capabilities, and skills

  • Formal training or certification on site reliability engineering concepts and 5+ years applied experience
  • Advanced knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform.
  • At least 2+ years of hands-on experience in architecting, scaling, and providing SRE support for AI/ML platforms and products, including infrastructure tech stacks such as Databricks, GPU clusters, Model Serving frameworks, Feature Stores, Vector Databases, and LLM inference pipelines.
  • Demonstrated ability to apply core SRE fundamentals 2D including reliability patterns, capacity planning, incident management, performance tuning, and toil reduction 2D specifically to AI/ML and data-intensive, compute-heavy workloads.
  • Experience in defining and enforcing SLOs/SLIs tailored to AI/ML workloads (e.g., model latency, throughput, data freshness, inference availability) to drive reliability at scale.
  • Demonstrated experience using enterprise-authorized AI capabilities within the work environment to improve reliability engineering workflows with strong validation habits and awareness of data sensitivity.

  • Ability to set team practices for safe AI usage in operations (e.g., review/approval expectations and escalation paths) while maintaining resiliency, security, and auditability outcomes.

  • Proven hands-on experience in designing and implementing Agentic AI-based solutions to deliver SRE capabilities at scale, including practical expertise with AI Agents, Skills, Context Management, Retrieval-Augmented Generation (RAG), and tool-use patterns.

  • Ability to apply Agentic AI frameworks to automate and augment core SRE functions such as intelligent incident detection and remediation, automated root cause analysis, predictive alerting, self-healing infrastructure, runbook automation, and observability enrichment to reduce toil and accelerate MTTR.
  • Contribute to governance and controls of AI usage with site reliability mindset and principles of CCB systems and platforms.
  • Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.

Preferred qualifications, capabilities, and skills

  • Experience with cloud-based data and analytics architecture, including AWS storage, Snowflake, Kubernetes (EKS), event-driven architectures, streaming services, batch jobs, and ETL pipelines.
  • Proficiency with modern data processing frameworks such as Apache Kafka, Apache Spark, and similar tools, with a focus on ensuring scalability, reliability, and performance of data and analytics platforms.
  • Strong communication skills with ability to mentor and educate others on site reliability principles and practices.
  • Recognized as an active contributor of the engineering community.

About JP Morgan Chase & Co.

JP Morgan Chase & Co. stands at the forefront of the global financial services industry. They offer an expansive array of products and services to a diverse clientele, including individuals, corporations, governments, and institutions. Ever since the merger of J.P. Morgan & Co. and Chase Manhattan Corporation in 2000, this industry-leading entity has become renowned for its comprehensive portfolio encompassing consumer and community banking, corporate and investment banking, commercial banking, as well as asset and wealth management. Headquartered in the vibrant city of New York, JP Morgan Chase & Co. boasts a formidable presence across over 100 countries worldwide.

Unveiling Employment Opportunities at JP Morgan Chase & Co.

Vacancies and Hiring Initiatives

JP Morgan Chase & Co. is continuously on the lookout for talented individuals eager to contribute to its legacy of excellence. The company's recruitment efforts are geared towards identifying candidates with the right blend of skills and qualifications to drive forward its various business segments. Whether you are a seasoned professional or a recent graduate, JP Morgan Chase offers a plethora of job openings across multiple disciplines.

High-Demand Positions

Among the myriad of roles, certain positions stand out for their attractive compensation packages and career advancement prospects. Notably, high-paying jobs at JP Morgan Chase & Co. include Relationship Manager, Branch Manager, and Software Engineer. These roles are critical to the firm's operations and offer lucrative opportunities for those with the requisite expertise.

Navigating the Job Market at JP Morgan Chase & Co.

Leveraging Job Portals and Job Alerts

For job seekers aiming to tap into the opportunities at JP Morgan Chase, staying updated through job portals and subscribing to job alerts is crucial. These tools can provide timely information about job openings, job fairs, and recruitment events, enabling candidates to apply promptly and prepare adequately for interviews.

Preparing Your Job Application

Your job application, comprising your resume and cover letter, is your ticket to securing an interview at JP Morgan Chase. Highlight your qualifications, skills, and experiences that align with the job listing, ensuring you stand out in the competitive job market.

Acing the Interview

Preparation is key to succeeding in your interview with JP Morgan Chase. Familiarize yourself with the company's business segments, values, and recent achievements. Demonstrating how your background and aspirations match the company's goals can significantly increase your chances of employment. A World of Job Opportunites in the Financial Services Industry JP Morgan Chase & Co. offers a world of job opportunities for those seeking to make their mark in the financial services industry. With competitive salaries, comprehensive benefits, and endless possibilities for growth, positions at JP Morgan Chase are highly coveted. By staying informed through job sites, tailoring your applications, and preparing thoroughly for interviews, you can enhance your prospects of joining the esteemed ranks of JP Morgan Chase employees. Explore the job board, seize the job opportunities, and embark on a rewarding career journey with one of the world's leading financial institutions.
Learn more about JP Morgan Chase & Co.
Size
661 employees
Market Cap
$384.5 billion
Industry
Net Income
$29.1 billion
Founded
1823
5 Year Trend
+0.7%
Revenue
$261.5 million
NASDAQ

Similar Jobs

More Jobs at JP Morgan Chase & Co.

More Information Technology Jobs

Find similar Senior Lead Site Reliability Engineer jobs: