Site Reliability engineering portfolio consists of several mission critical applications for americanexpress.com. Mobile and Web engineering enterprise applications are highly available applications, maintains high (~100%) availability in an extremely high throughput transactional system with strict performance requirements. Site Reliability Engineering team of MWE portfolio works with various Product teams, Staff Architects, Engineering Leaders and Engineering Teams across Mobile and Web engineering platform. Primary focus of the Site Reliability Engineering team is to conceptualize, design, develop and implement frameworks/common components, instrumenting observability tools for enterprise that will ensure high application reliability, scalability, availability and performance of the Mobile and Web applications. Site reliability team is embarking on a transformation journey to implement "Robotics first" approach in Service Delivery and Site Reliability Engineering space.
The Sr. Engineer I (Site Reliability Engineer) role is a hands-on Senior Architect Level position supporting American Express' MWE Service Reliability Engineering team. The ideal candidate must have experience in full stack engineering.
What you will be doing:
- Conceptualize and implement Machine Learning driven Site Reliability Engineering Framework/Components to improve predictive monitoring and driving SRE team's journey towards "Robotics First" approach.
- Research latest technology, concepts, conceptualize solution and develop proof of concept that will improve resiliency and performance of the production infrastructure. Design and implement innovative solution/framework that will improve software engineering velocity, infrastructure resiliency and security, and data availability.
- Develop common framework components (to be leveraged by enterprise applications), define standards for configuration, monitoring, reliability and performance engineering.
- Work with operations team to resolve major incidents.
- Continuously improve automated remediation tasks to ensure the highest levels of availability.
- A BS degree in Computer Science, Computer Engineering, other Technical discipline, or equivalent work experience.
- 10 + years of Technical hands-on experience with systems analysis, incorporating: Design Methodology, Production Support and Engineering, Enterprise level technologies including, but not limited to OpenShift, WebSphere Administration, JEE (JSP, Servlets, XML, Java), and internet-related technologies to deliver complex Internet facing solutions.
- Hands on experience with frameworks - Spring Boot, Vertex, NodeJS
- Experience in designing mission critical highly available enterprise applications.
- Hand on experience with performance testing framework design, tuning Java applications.
- Experience managing relational and NoSQL databases such as DB2, Postgres.
- Strong knowledge of Linux internals and experience managing Linux systems in high traffic environments.
- Strong interpersonal communication skills and the ability to work well in a diverse team-focused environment.
- Experience with Splunk and/or ELK.
- Familiarity with financial services and authorizations systems.
- Experience with machine learning implementation would be an advantage.