Site Reliability Engineer

Axle Informatics

• $140K — $155K *

Frederick, MD 21702In-Person

Enterprise Technology

5 - 7 years of experience

Reposted Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

6+ years in DevOps/SRE roles with monitoring tools like Prometheus or Grafana
4+ years hands-on experience with Linux systems such as Ubuntu or CentOS
4+ years automating Infrastructure-as-Code deployments on AWS, GCP, or Azure
4+ years with CI/CD tools like Terraform, Ansible, and Jenkins
Strong scripting skills in Python, Bash, or PowerShell
Experience with debug/troubleshoot SQL/NoSQL databases and modern programming stacks
Cloud certifications preferred.

Responsibilities

Design and implement monitoring and observability frameworks across distributed systems
Establish and manage SLIs, SLOs, and error budgets for reliability improvements
Develop real-time inventory systems across cloud and hybrid environments
Automate workload onboarding and offboarding for standardization
Build proactive detection using AIOps to minimize incidents
Design scalable and secure infrastructure platforms across cloud environments
Evangelize best practices in DevOps, SRE, and platform engineering.

Benefits

Collaborative and growth-oriented team culture
Opportunities for continuous learning and experimentation
Access to cutting-edge technologies for building innovative solutions
Flexibility in shaping multi-cloud and platform engineering solutions
Dynamic work environment that encourages cross-functional leadership

Full Job Description

Site Reliability Engineer role centers on modernizing and consolidating a complex multi-cloud environment across AWS, Azure, and GCP, building a scalable, secure, and observable platform from the ground up using Kubernetes, AI/ML infrastructure, and zero-trust principles. You'll combine DevOps and SRE practices to support mission-driven scientific and clinical programs, emphasizing automation, reliability, compliance, and proactive monitoring while enabling innovation through AI-driven tooling. The team culture is highly collaborative and growth-oriented, valuing experimentation, continuous learning, and cross-functional leadership, with opportunities to shape future multi-cloud and platform engineering solutions.

Responsibilities:

Design and implement enterprise-grade monitoring and observability frameworks (metrics, logs, traces) across distributed systems using enterprise Splunk, Grafana and Open-telemetry tools
Establish and manage SLIs, SLOs, and error budgets to drive reliability improvements
Develop and maintain real-time asset inventory systems across cloud, on-prem, and hybrid environments
Automate workload onboarding and offboarding processes, ensuring standardization and governance
Track system ownership, dependencies, and lifecycle states for operational transparency
Build proactive detection mechanisms using AIOps and intelligent alerting to minimize incident impact
Design and operate scalable, resilient, and secure infrastructure platforms across cloud and hybrid environments
Implement automated compliance tracking and enforcement aligned with organizational and regulatory standards (e.g., NIST, FISMA, FedRAMP)
Embed ITIL processes (incident, change, problem, configuration management) into SRE workflows
Build and maintain automated deployment environments and pipelines that enforce security, compliance, and operational standards
Develop "golden paths" and standardized platform templates for consistent workload deployment
Automate provisioning, patching, configuration management, and environment lifecycle
Leverage AI/ML coding assistants and vibe coding practices to rapidly develop automation scripts, tools, and internal platforms
Integrate AI-driven tooling into DevOps pipelines for code quality, security scanning, and operational insights
Lead adoption of AI-enhanced SRE practices, including intelligent remediation and predictive operations
Champion DevOps and SRE practices including Infrastructure as Code, CI/CD, observability, and reliability engineering
Build developer-friendly platforms ("golden paths") that simplify deployments, reduce friction, and improve velocity
Enable and optimize infrastructure for AI/ML workloads, including data pipelines, storage systems, and inference environments, GPU-enabled and high-performance compute workloads
Build and manage containerized and orchestrated platforms (Docker, Kubernetes)
Support cloud migration, modernization, and platform standardization initiatives
Ensure systems meet security, compliance, backup, and disaster recovery requirements
Evangelize and promote best practices in DevOps, SRE, and platform engineering to developer communities
Stay abreast of new technologies in your areas but not limited to AIOps, MLOps, cloud computing & deployment, site reliability engineering, infrastructure automation, security best practices, data engineering etc.

Requirements:

Must have total of 6+ experience DevOps / SRE roles with monitoring and observability tools (Prometheus, Grafana, ELK, or cloud-native equivalents) for on-prem and cloud hosted workloads.
Must have 4+ years of Hands-on Linux experience that includes Ubuntu/CentOS/Red Hat operating systems, containers, dependency management and administration support
Must have 4+ years of experience automating Infrastructure-as-Code (IaC) deployments to one of the following cloud platforms Amazon AWS, Google GCP and Microsoft Azure
Must have 4+ years with CI/CD and automation tools such as Terraform, Ansible, Chef, Puppet, Jenkins, GitHub Actions
Strong scripting skills (Python, Bash, PowerShell or similar)
Must be proficient using vibe coding and coding assistants to develop scripts, tools and applications for the DevOps and SRE use cases
Must have proficiency to debug or troubleshoot and/or deploying SQL and/or NoSQL databases, object storage, web servers, open-source programming stack for Node.JS, R, Python, .NET Core, Java is desired but not mandatory
Must be willing to learn new technologies, adopt and adapt to emerging technologies or needs from a project to a project
Cloud certifications is preferred
Certifications in Grafana, Splunk, Docker, Kubernetes is preferred but optional

Disclaimer: The above description is meant to illustrate the general nature of work and level of effort being performed by individuals assigned to this position or job description. This is not restricted as a complete list of all skills, responsibilities, duties, and/or assignments required. Individuals may be required to perform duties outside of their position, job description or responsibilities as needed.

This role has a market-competitive salary with an anticipated base compensation range listed below. Actual salaries will vary depending on a candidate's experience, qualifications, skills, and location.

Salary Range

$140,000-$155,000 USD

* Ladders Estimates

Similar Jobs

Senior Splunk Administrator
$121K — $225K *
Veeam Software
Remote
Today
Systems Architect/Engineer 3 - Enterprise Architecture & Security Engineer
$89K — $218K *
CGI
Springfield, VA 22153 (Fairfax County)
Today
Platform Engineering Technical Lead
$123K — $310K *
Clarity Innovations
Fort George G Meade, MD 20755 (Anne Arundel County)
Today
STINGRAI Tech Advisor (Operations Research Analyst 4) - 29231
$115K — $155K *
Huntington Ingalls Industries
Fort Washington, MD 20744 (Prince Georges County)
Today
Systems Architect, Advisor
$104K — $166K *
Peraton
Germantown, MD 20874 (Montgomery County)
Today
DSCA AWS Solutions Architect Mid-Level (Engineer Systems Architect) - 29602
$135K — $170K *
Huntington Ingalls Industries
Arlington, VA 22204 (Arlington County)
Today

Get Ready For Your
Next Interview

More Jobs at Axle Informatics

Site Reliability Engineer
$140K — $155K *
Frederick, MD 21702 (Frederick County)
Reposted Today
Enterprise Technology
In-Person
Staff Scientist - Genomics
$100K — $110K *
Hamilton, MT 59840 (Ravalli County)
Today
Pharmaceuticals & Biotech
In-Person
Biologist
$90K — $100K *
Rockville, MD 20850 (Montgomery County)
Today
Pharmaceuticals & Biotech
In-Person
Senior Contracts Administrator
$85K — $100K *
Rockville, MD 20850 (Montgomery County)
Yesterday
Education, Government & Non-Profit
In-Person
Manager of Business Development & Sales - Healthcare Technology (Provider Solutions)
$85K — $130K *
Cincinnati, OH 45238 (Hamilton County)
3 days ago
Healthcare
In-Person

More Enterprise Technology Jobs

Senior Salesforce Platform Business Partner
$150K — $170K *
Confidential Company
Boston, MA 02108 (Suffolk County)
Yesterday
Senior Partner Consultant, SI Programs
$130K — $160K *
RAMP Holdings
New York, NY 10025 (New York County)
Today
Product Support Engineer
$80K — $95K *
Experlogix
Dallas, TX 75217 (Dallas County)
Today
FS/ Principal - GRC Controls Tech - Infosys Consulting
$154K — $193K *
Infosys
West New York, NJ 07093 (Hudson County)
Reposted Today
Principal Architect - Microsoft Cloud (Azure, Data & AI)
$200K — $270K *
Saatchi & Saatchi
Boston, MA 02115 (Suffolk County)
Reposted Today

Find similar Site Reliability Engineer jobs:

Nationwide Frederick, MD

Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Site Reliability Engineer jobs:

Get Ready For Your
Next Interview