Ad Hoc

Site Reliability Engineer

Ad Hoc$125K — $135K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree with 5+ years of relevant experience (substitutable with applicable experience)
  • Experience with monitoring, observability, and on-call operations
  • Proficient in at least one infrastructure-as-code tool (Terraform preferred)
  • Familiarity with containerization, networking, and cloud infrastructure in a DevOps context
  • Ability to obtain and maintain a U.S. Public Trust / suitability determination

Responsibilities

  • Monitor platform health and support SLOs, SLIs, and error budgets
  • Build and maintain observability tooling: metrics, logging, and dashboards
  • Participate in on-call rotations and incident response processes
  • Contribute to blameless postmortems and implement follow-up actions
  • Automate repetitive operational tasks to minimize toil
  • Support capacity planning and performance tuning for AWS and Kubernetes
  • Implement reliability improvements using infrastructure as code (Terraform)

Benefits

  • Company-subsidized health, dental, and vision insurance
  • Flexible PTO policy
  • 401K plan with employer match
  • Paid parental leave after one year of service
  • Employee Assistance Program
Full Job Description
Site Reliability Engineer
Job number: 880

This is a remote position.

The Veterans Affairs business unit helps transform the VA into a modern digital services organization where Veteran outcomes are at the center of every effort. We partner with the VA to design and deliver seamless user experiences for Veterans, their families and caregivers, and VA employees. By applying better practices in service design, product management, and technology, we enable the VA to increase the use, quality, and reliability of services and decrease the time Veterans spend waiting for outcomes.

Primary Responsibilities:

As a Site Reliability Engineer, you will help ensure the availability, performance, and reliability of a large federal enterprise cloud platform that operates around the clock. With the support and guidance of senior engineers, you will help meet scope, schedule, and delivery requirements while improving the platform's reliability practices. Primary expectations of a Site Reliability Engineer include:
  • Monitoring platform health and supporting service level objectives (SLOs), service level indicators, and error budgets
  • Building and maintaining observability tooling, including metrics, logging, alerting, and dashboards
  • Participating in on-call rotations and incident response, helping restore service and reduce time to recovery
  • Contributing to blameless postmortems and driving follow-up actions
  • Automating repetitive operational tasks to reduce toil
  • Supporting capacity planning and performance tuning across cloud infrastructure (AWS) and Kubernetes (Amazon EKS)
  • Implementing reliability improvements as infrastructure as code (Terraform)
  • Working with government partners and application teams to meet security, SLA, and performance requirements
  • Supporting recruiting efforts by evaluating exercises and assisting with interviews


Basic Qualifications:
  • Bachelor's and 5+ years of experience; relevant experience may be substituted for education
  • Experience with monitoring and observability tooling and on-call operations
  • Proficient with at least one infrastructure-as-code tool (Terraform preferred)
  • Background in key DevOps concepts: containerization, networking, and cloud infrastructure
  • Must be able to obtain and maintain a U.S. Public Trust / suitability determination


Preferred Qualifications:
  • Prior experience with the Department of Veterans Affairs
  • Experience with Kubernetes (Amazon EKS) and AWS in production
  • Familiarity with SLO-based reliability practices and error budgets
  • Relevant certifications (e.g., AWS, Certified Kubernetes Administrator)

To learn more about working at Ad Hoc, please visit:https://adhocteam.us/join

Benefits:
  • Company-subsidized health, dental, and vision insurance
  • Flexible PTO
  • 401K with employer match
  • Paid parental leave after one year of service
  • Employee Assistance Program

In support of various state and city equal pay transparency laws, Ad Hoc job descriptions feature the starting range we reasonably expect to pay to candidates who would join our team with little to no need for training on the responsibilities we've outlined above. Actual compensation is influenced by a wide range of factors including but not limited to skill set, level of experience, and responsibility. The range of starting pay for this role is $125,000-$135,000. Our recruiters will be happy to answer any questions you may have, and we look forward to learning more about your salary requirements.

job reference:

https://adhoc.team/

About Ad Hoc

Ad Hoc is a digital services company that helps government agencies improve the user experience of their digital services. They work with clients across a range of industries, including healthcare, finance, and transportation. Ad Hoc provides a range of services, including user research, design, and development. They are known for their user-centered approach and their ability to deliver high-quality digital services that meet the needs of their clients and their users. Ad Hoc was founded in 2014 and is headquartered in Washington, DC.
Learn more about Ad Hoc
Size
200 employees
Industry
Founded
2014

Similar Jobs

More Jobs at Ad Hoc

More Information Technology Jobs

Find similar Site Reliability Engineer jobs: