Site Reliability Engineer

Skyward IT Solutions, LLC

$112K — $150K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in computer science or related field or equivalent experience.
  • 3-5 years in site reliability or cloud engineering with significant AWS experience.
  • Solid knowledge of AWS services and best practices.
  • Hands-on experience with infrastructure-as-code tools (e.g., Terraform, Ansible).
  • Understanding of CI/CD pipelines and automation tools.
  • Scripting and automation skills in Python.
  • Familiarity with monitoring tools like CloudWatch and New Relic.
  • Strong problem-solving skills, able to work calmly under pressure.

Responsibilities

  • Support CMS in modernizing enterprise knowledge and data systems into an AI-driven platform.
  • Operate and optimize AWS environments to meet availability SLAs during transitions.
  • Implement monitoring and alerting to enhance system observability and performance tracking.
  • Automate deployments using infrastructure-as-code and CI/CD tools.
  • Define and monitor SLIs and SLOs to inform data-driven decisions.
  • Optimize systems for performance, security, and cost-effectiveness.
  • Collaborate on security modernization efforts regarding vulnerability reviews.
  • Design and maintain disaster recovery and business continuity plans.

Benefits

  • Fully paid medical, dental, and vision insurance for employees.
  • 15 days of paid leave plus 7 days of sick leave.
  • Up to 4 weeks paid parental leave.
  • 401K plan with 4% employer contribution and no vesting period.
  • Annual budget for professional development and technical supplies.
  • Life and disability insurance provided.
  • Flexible working hours and remote work opportunities.
  • Collaborative environment focused on innovative solutions in government services.
Full Job Description
We need an SRE.
Do you have a real feel for how distributed systems behave, and a knack for tracking down the network, infrastructure, or pipeline issue everyone else gave up on? Are you comfortable in the cloud, fluent in CI/CD, and the type who believes an alert should mean something and a dashboard should tell a story? If you love keeping complex systems healthy, fast, and quietly reliable, then apply. Like, now.

Come join us if you're motivated to learn from others, to learn from mistakes, to be part of a future-looking and growth-oriented team.

Let's go Skyward together.

What you'll do:

  • Join the team supporting the Centers for Medicare & Medicaid Services (CMS) as it merges and modernizes its enterprise knowledge and data systems into a single, AI-driven platform, reducing manual effort, improving data accuracy, and enhancing transparency for stakeholders.
  • Keep the systems up and the users happy. Operate and tune AWS environments to meet infrastructure and application availability SLAs, even during transition and change.
  • Build observability that actually informs. Implement continuous monitoring, alerting, and dashboards using tools like AWS CloudWatch, New Relic, and Splunk, and establish performance baselines so you can spot degradation before users do.
  • Automate the toil. Write infrastructure-as-code (Terraform, Ansible) and support CI/CD pipelines (Jenkins) and containerized workloads (Docker) for repeatable, reliable deployments.
  • Define and track the numbers that matter. Set and monitor SLIs and SLOs, and produce performance, load/stress, and bottleneck reports that drive smarter decisions.
  • Optimize for performance, security, and cost. Use tools like AWS Trusted Advisor to find and act on improvement opportunities.
  • Support security and compliance modernization. Partner with the Security & Compliance SME to review vulnerability and security scans, feed continuous monitoring, and help advance the move toward a Continuous ATO (cATO) within a FISMA Moderate boundary (RMF, ARS, IS2P2).
  • Strengthen resilience. Help design and maintain disaster recovery and COOP continuity so the systems hold up against outages, incidents, and the unexpected.
  • Own incidents end to end. Drive response, run blameless post-mortems, and implement the preventative fixes that keep the same thing from happening twice.


What we'd like you to have:

  • A bachelor's degree in computer science, engineering, or a related field (or equivalent hands-on experience).
  • 3-5 years of experience in site reliability, systems, or cloud engineering, with meaningful time spent in AWS environments.
  • Solid working knowledge of core AWS services, architecture, and best practices.
  • Hands-on experience with infrastructure-as-code tools (Terraform, Ansible, or CloudFormation).
  • A good understanding of CI/CD pipelines and automation tools (Jenkins, GitLab CI, or similar).
  • Comfort scripting and automating in Python.
  • Familiarity with monitoring and observability tooling (CloudWatch, New Relic, Splunk, or comparable).
  • Strong problem-solving instincts and the composure to work calmly under pressure.
  • Clear communication skills, with the ability to make complex technical concepts understandable.


What would blow us away:

  • You've previously worked with CMS.
  • You have experience working in AI, NLP, or LLM-driven environments.
  • You have all the AWS certifications and the real-world scars that come with them.


Even if you don't meet 100% of the qualifications, we encourage you to apply. At Skyward, we're focused on hiring individuals with the right skills and passion to grow, not just checking off every box.

And now the important part. What we offer you:

  • Medical, dental, vision insurance (fully paid for employees)
  • 15 days of paid leave
  • 7 days of sick leave
  • 2 days bereavement leave
  • 11 paid Federal holidays
  • Up to 40 hours for jury duty
  • 401K with 4% employer contribution (and no vesting period)
  • Up to 4 weeks of paid paternity and maternity leave
  • Company provided laptop
  • $5,000 per year for professional development
  • $600 per year for technical supplies and equipment
  • $2,000 referral bonus
  • Life and disability insurance
  • HSA and FSA
  • Legal Shield and ID Shield Voluntary Benefits
  • Opportunity to work in a collaborative, motivated team focused on modernizing government services with cutting-edge technology and innovative solutions. Who says government work can't be exciting!


$112,000 - $150,000 a year

We believe great work deserves great pay. That's why we ensure our compensation is not only competitive but also fair and transparent, as required by Maryland law. Expect a salary that matches your skills, experience, and the value you bring to the table - because you're worth it!

At Skyward, we support flexible working hours and remote opportunities to help maintain a healthy work-life balance for all employees.

Similar Jobs

More Jobs at Skyward IT Solutions, LLC

  • Site Reliability Engineer
    $112K — $150K *
    Rockville, MD 20850 (Montgomery County)
    Information Technology
    Hybrid
  • Solutions Architect
    $150K — $190K *
    Rockville, MD 20850 (Montgomery County)
    Enterprise Technology
    Hybrid
  • Security Engineer
    $120K — $160K *
    Rockville, MD 20850 (Montgomery County)
    Information Technology
    Hybrid
  • Proposal Writer
    $90K — $110K *
    Rockville, MD 20850 (Montgomery County)
    Education, Government & Non-Profit
    Hybrid

More Information Technology Jobs

Find similar Site Reliability Engineer jobs: