As the Cloud Platform Site Reliability Engineer on our Defense Innovation Unit (DIU) Pilot Training Transformation (PTT) team you will collaborate to assure the program’s mission to be a member of the engineers responsible for the infrastructure our customers use internally and externally to run the business with observability, scalability, and capacity planning as main functions. Position requires solid experience in building partnerships with business stakeholders, development teams, analysts, architecture, automation, enterprise security, and internal Incident response teams to identify, design, and implement automated and efficient solutions and provide technical solutions, environmental operational support, triage, and remediate issues maximizing efficiency and availability of the platform on a day to day basis while also embracing agile/scrum methodologies and building DevOps culture and practices.
This program provides the support required for maintaining, modifying and enhancing all Defense Innovation Unit (DIU) Pilot Training Transformation (PTT) cloud applications, data warehouse, and analytics.
Location: Telework Available
Cloud Platform Site Reliability Engineer will be part of a team that is responsible for application onboarding, infrastructure provisioning, deployment automation, ITSM onboarding, maintain the environment for compliance, availability, optimized cloud cost while providing full application development lifecycle support across all environments including production support as per industry best practices and internal standards and controls.
Specific tasks include, but are not limited to:
- Strong, proven experience as a DevOps engineer in a scalable production environment built on Google Cloud platform.
- 3-5 years of experience supporting complex business applications in a large enterprise environment as per DevOps principles in an Agile environment.
- Proven experience with Google cloud environments & its offerings (PaaS, SaaS or IaaS) to design, document, and implement highly scalable and reliable infrastructure solutions.
- Proven experience with automation, working with APIs, Microservices in Kubernetes, maintaining infrastructure in Windows servers.
- Proven experience in supporting Python development lifecycle for enterprise-level applications in an Agile environment.
- Proven experience multi-tasking by balancing the delivery of multiple simultaneous projects, working with applications teams to deliver hosting platforms with aggressive timelines.
- Proven experience in troubleshooting issues across multiple applications and platforms.
- Proven experience with Agile practices, methodologies (Sprint planning, Daily Scrums, Backlog grooming, and updating artifacts to ensure work is tracked appropriately) and Atlassian Products (Jira, Confluence, Service Desk).
- Strong conceptualization abilities, detailed oriented, critical/analytical thinking & troubleshooting skills with the ability to work independently and deliver consistent results to difficult problems.
- Strong team player with a desire to learn by collaborating with peers and other DevOps teams.
- Effective organization and communication skills, both written and verbal to the extent of communicating technical issues to non-technical stakeholders eloquently.
- Bachelor’s degree computer science, networking, cybersecurity or a related field with 3-5 years of experience
- Proven experience working with a hybrid team
- Proven experience in supporting cloud solutions across PaaS and IaaS platforms for cloud services, SQL, Python, App Insights, Log analytics, and other 3rd party software’s
- Proven experience in coaching and mentoring junior and offshore team members.
- Strong written and verbal communication skills.
- U.S. citizen
- Ability to obtain a Clearance
- Google Cloud DevOps Engineer Professional Certificate
- Prolonged periods of sitting at a desk and working on a compute
- Must be able to lift 10-15 pounds at times.