Site reliability engineers at USAA will have the job title of a Software Developers and Integrators (SDIs) who are also engaged in all phases of the software development lifecycle which include; gathering and analyzing user/business system requirements, responding to outages and creating application system models. SDIs primary functions are to design, develop, document, test and debug new and existing software systems and/or applications for internal use, perform defect corrections (analysis, design, code).
In addition, SDIs participate in design meetings and consult with business clients to refine, test, and debug programs to meet business needs, and interact and sometimes direct third-party partners in the achievement of business and technology initiatives.
This role is a solid, career-level role where functional and technical proficiency has been obtained, and incumbents display a depth of technical understanding within their respective areas of specialization allowing them to operate independently. Incumbents also display a proficiency that allows them to begin to mentor others (third party and internal resources) on procedural matters.
- Work with application and system SME's to design highly scalable and resilient distributed systems
- Create Service Level Objectives to measure and manage core infrastructure and critical services
- Analyze, troubleshoot and fix core infrastructure or critical systems when they fail or degrade
- Write custom code or scripts to automate repetitive or manual system support tasks
- Lead technical post mortems to identify lessons learned and implement improvements
- Partner with technical teams and product owners to ensure resiliency work is developer-ready
- Design and execute failure injection tests to verify adequate system capacity and resiliency
- Champion Site Reliability Engineering practices across IT organization.
- Independently installs, customizes and integrates commercial software packages.
- Facilitates root cause analysis of system issues.
- Works with experienced team members to conduct root cause analysis of issues, review new and existing code and/or perform unit testing.
- Learns to create system documentation/play books and attends requirements, design and code reviews.
- Receives work packages from manager and/or delegates.
- Identifies ideas to improve system performance and impact availability.
- Resolves complex technical design issues.
- Creates system documentation/play book(s) and participates as a reviewer and contributor in requirements, design and code reviews.
- May serve as the subject matter expert on development techniques.
- Partners with experienced team members to develop accurate work estimates on work packages.
- May serve as a mentor on procedural matters to less experienced internal and third party team members.
- May assist experienced team members with the delegation of work packages.
- Bachelor's degree or 4 additional years of related experience beyond the minimum required may be substituted in lieu of a degree.
- 4+ years of software development experience demonstrating depth of technical understanding within a specific I/T discipline(s)/technology(s) to include relevant business support and/or general information technology support experience
- Working knowledge of systems administration and/or systems programming skills
- Strong interest in monitoring, optimizing, scaling and troubleshooting large distributed systems
*Qualifications may warrant placement in a different job level*
When you apply for this position, you will be required to answer some initial questions. This will take approximately 5 minutes. Once you begin the questions you will not be able to finish them at a later time and you will not able to change your responses.
- 4+ years of experience managing large scale production environments (1000+ servers) and experience with production support of applications in large scale environments
- Demonstrated experience influencing and selling new ideas to peers, leadership, and senior management
- Experience in one or more of the following: C, C++, Java or Python
- Strong experience or working knowledge of end-to-end IT systems (compute, storage, network, security, application runtime, relational databases, REST services, asynchronous messaging, etc.)
- Strong troubleshooting skills and experience developing enterprise applications
- Demonstrated experience building SLO-based monitoring solutions
- Strong teaming and collaboration skills