As a Site Reliability Engineer (SRE), you will work closely with application development teams to build standards that drive the highest levels of availability across the Digital Acquisitions channel. You will join a team that provides 24/7 support and are expected to develop solutions that improve production support, monitoring services, while responding to incidents to ensure a high level of availability of applications. You can expect to spend about 50% of your time on engineering work -- this means things such as infrastructure automation, designing and building tools, as well as code to support our application teams.
If you were to join our team, you would be expected to:
- Work closely with our application engineering teams to launch and maintain applications both on-premise and hybrid-cloud.
- Act as primary escalation point from L1 support team in helping to make decisions to restore service and minimize impact to availability.
- Provide production support and respond to production incidents as the first line of defense for the organization
- Facilitate the resolutions of non-application issues (3rd party upstream issues, infrastructure issues, storage, database, network, file transfer etc.)
- Drive monitoring requirements to ensure business-service level visibility for all support teams
- Debug network and performance issues in large scale distribute systems.
- Provide consultation and strategic recommendations by quickly assessing and remediating complex availability issues.
- Introduce new and impactful technologies to the production support tool chain to help minimize friction for production releases and that results in quick diagnosis and recovery from production incidents.
- 6 to 8 years work experience in DevOps environment with java/J2EE/REACT JS applications
- Experience managing a team of engineers, conducting 1 on 1s, career path development, and mentoring engineers to improve overall technical capabilities.
- A BS degree in Computer Science, Computer Engineering, other Technical discipline, or equivalent work experience.
- Experience in Java, Python, Go, React, or Ruby
- Experience with supporting 3 tier architecture which includes exposure to at least 2 of : IBM DB2, Couchbase, Mongo, Redis
- Hands on experience leveraging enterprise tools such as Grafana, Dynatrace, AppDynamics, Jenkins, Splunk
- Experience working in a distributed team model with daily hand off of issues during shift changes
- Broad Technical field exposure, with preference to following skills: Cloud Infrastructure, VM, load balancing, containers, OpenShift, Kubernetes, JVM's, web servers, application debugging, queing technologies, Caching technologies, databases, routing and switching, monitoring tools such as prometheus etc.
Bonus Points if you:
- Familiarity with financial services and authorizations systems is a plus.
- Understanding of using Agile Practices in Operations teams