Senior Site Reliability EngineerJob number: 884This is a remote position.The
Veterans Affairs business unit helps transform the VA into a modern digital services organization where Veteran outcomes are at the center of every effort. We partner with the VA to design and deliver seamless user experiences for Veterans, their families and caregivers, and VA employees. By applying better practices in service design, product management, and technology, we enable the VA to increase the use, quality, and reliability of services and decrease the time Veterans spend waiting for outcomes.
Primary Responsibilities: As a Senior Site Reliability Engineer, you will serve as an experienced individual contributor responsible for the availability, performance, and reliability of a large federal enterprise cloud platform that operates around the clock. With minimal oversight, you will help meet scope, schedule, and delivery requirements while shaping the platform's reliability strategy. Primary expectations of a Senior Site Reliability Engineer include:
- Defining and maintaining service level objectives (SLOs), service level indicators, and error budgets, and driving the platform toward them
- Designing and operating observability across metrics, logging, tracing, and alerting
- Leading incident response and on-call practices, including escalation, mitigation, and time-to-recovery improvements
- Driving blameless postmortems and systemic reliability improvements
- Engineering automation to eliminate toil and improve operational efficiency
- Self-directed design of reliable cloud infrastructure (AWS) and Kubernetes (Amazon EKS), including tradeoffs between cost, reliability, and efficiency
- Building reusable modules and mentoring engineers on reliability practices
- Presenting design documents and system diagrams to stakeholders
- Participating in technical depth interviews with new candidates
Basic Qualifications:- Bachelor's and 7+ years of experience; relevant experience may be substituted for education
- Demonstrated experience owning reliability (SLOs, observability, incident response) for production systems
- Expert-level knowledge of at least one infrastructure-as-code tool (Terraform preferred)
- Deep command of cloud infrastructure, containerization, and networking
- Must be able to obtain and maintain a U.S. Public Trust / suitability determination
Preferred Qualifications:- Prior experience with the Department of Veterans Affairs
- Kubernetes (Amazon EKS) and AWS at scale
- Familiarity with FedRAMP, NIST 800-53, and zero-trust architecture
- Relevant certifications (e.g., AWS, CKA/CKS)
To learn more about working at Ad Hoc, please visit:https://adhocteam.us/join
Benefits:- Company-subsidized health, dental, and vision insurance
- Flexible PTO
- 401K with employer match
- Paid parental leave after one year of service
- Employee Assistance Program
In support of various state and city equal pay transparency laws, Ad Hoc job descriptions feature the starting range we reasonably expect to pay to candidates who would join our team with little to no need for training on the responsibilities we've outlined above. Actual compensation is influenced by a wide range of factors including but not limited to skill set, level of experience, and responsibility. The range of starting pay for this role is $135,000-$150,000. Our recruiters will be happy to answer any questions you may have, and we look forward to learning more about your salary requirements.
job reference:
https://adhoc.team/