Texas Capital Bank is seeking a highly motivated Site Reliability Engineer to work on the bank’s Enterprise Monitoring team. This role will help lead the site reliability engineering and monitoring of Texas Capital Bank’s cloud infrastructure and several of our mission-critical banking services. A strong sense of ownership, self-drive, creativity, innovation, technical skills, and cloud technology experience will ensure success in this role. Successful candidates must work in an environment where processes and procedures are currently being developed and documented. Your input and contributions to these processes will directly drive the success of this initiative.
Responsibilities
- Engage in and improve the whole lifecycle of service, from inception and design, through to deployment, operation, and refinement.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Scale systems through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
- Ability to observe, diagnose, and develop fixes for production issues quickly and efficiently.
- Ability to develop and drive real-time monitoring solutions that provide visibility into site health and key performance indicators.
- Highly confident and capable of reporting and communicating high-value metrics to leadership.
- Develops and automates standard operating procedures around common failure scenarios.
- Designs, manages, and maintains tools to automate operational processes.
- Writing code for the continuing reduction of human intervention in operational tasks and automation of processes.
- Redefining governance models around the automation tools that allow for their use throughout the enterprise.
- Building automation to deliver metrics reports.
- Help define SLIs and SLOs
- Provide subject matter expertise to application teams in the areas of monitoring and site reliability
- Collaborate and provide input in the creation of standards, polices, and processes.
- Create and maintain team knowledge documentation.
The duties listed above are the essential functions or fundamental duties within the job classification. The essential functions of individual positions within the classification may differ. Texas Capital Bank may assign reasonably related additional duties to individual employees consistent with standard departmental policy.
Qualifications
- BS in computer science or a related field.
- At least 5-10 years of relevant technical and/or business work experience.
- Experience with high availability and scalability in the cloud.
- Ability to debug and optimize code and to automate routine tasks.
- Proficient in at least one of the following scripting/programming languages: Python, Ruby, PowerShell or Bash.
- Experience with configuration management tools (Puppet, Chef, SaltStack, Ansible, etc.).
- Experience with application monitoring tools (New Relic, Dynatrace, AppDynamics, etc.).
- Experience working with revision control systems and code repositories.
- Experience in developing Continuous Integration (CI) pipelines in Azure DevOps or Jenkins.
- ServiceNow experience is preferred.
- ITIL certification is preferred.
- Cloud certification is preferred.