Site Reliability and DevOps Engineering Lead

Merative

• $131K — $197K *

US-AnywhereRemote in United States

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor's degree in computer science, Engineering, or related field.
6-10 years of hands-on experience in software operations, DevOps, and Site Reliability Engineering.
Strong communication skills for effective leadership and collaboration.
Proven track record in maintaining high availability of production environments.
Expertise in software delivery pipelines, CI/CD, and configuration management.
Exceptional problem-solving skills for addressing complex system issues under pressure.
Proficiency in at least one programming language (e.g., Python, Bash, or Java).

Responsibilities

Lead, mentor, and grow the Platform / DevOps engineering team.
Build a high-performing Platform team focused on reliability and delivery.
Ensure platform capabilities accelerate product delivery and remove bottlenecks.
Define and enforce platform engineering standards and DevOps practices across teams.
Drive capacity planning, performance optimization, and cost efficiency.
Own SLIs, SLOs, and error budgets for platform reliability.
Lead incident management and continuous improvement initiatives.

Benefits

Remote first / work from home culture.
Flexible vacation policy to help you recharge.
Paid leave benefits.
Health, dental, and vision insurance.
401k retirement savings plan.
Infertility benefits.
Tuition reimbursement and additional support programs.

Full Job Description

Micromedex is seeking a highly skilled Platform Reliability & DevOps Engineering Lead who combines deep hands-on expertise in cloud services, infrastructure, and automation with a strong architectural understanding of distributed, high-availability systems.

You will lead the platform team, ensuring our mission-critical clinical platform is highly available (247), performant, scalable, and secure.

This role is both strategic and hands-on: you will define and drive the platform reliability and DevOps strategy, continuously improving system resilience and CI/CD capability, while partnering closely with engineering teams and vendors to embed operational excellence across the software lifecycle.

You will be accountable for the end-to-end reliability, operability, and delivery capability of the Micromedex platform, unifying Site Reliability Engineering, DevOps, and CI/CD ownership into a single platform function. This includes owning platform reliability outcomes, DevOps enablement, and delivery pipelines to support scalable, high-availability systems and faster, safer releases.

You are passionate about automation, proactive in addressing reliability and performance challenges, and committed to maintaining the trust of clinicians worldwide through resilient system design, strong operational discipline, and rapid incident response.

Responsibilities:

People & Team Leadership

Lead, mentor, and grow Platform / DevOps engineers
Build a high-performing Platform team
Drive accountability for platform reliability and delivery outcomes
Lead vendors to deliver capabilities in production.

Production Engineering & Platform Operations

Ensure platform capabilities accelerate product delivery, remove bottlenecks.
Defines and enforces platform engineering standards and DevOps practices across all teams and vendors
Lead capacity planning, performance optimization, and cost efficiency
Define operational standards, runbooks, and reliability practices
Accountable for platform reliability outcomes at enterprise/product level

Platform Strategy and Leadership

Act as technical authority across platform, reliability, and delivery
Define platform strategy and roadmap
Govern delivery across internal teams and vendors

Platform Reliability Ownership

Own SLIs, SLOs, and error budgets
Lead resilience engineering, observability, and failure design
Drive proactive risk reduction and continuous improvement
Own incident management frameworks and continuous improvement

CI/CD and Release Engineering

Own end-to-end pipeline architecture and release automation
Standardize, secure, and fully automate pipelines
Drive continuous integration, delivery, and validation practices

Incident Leadership

Lead Sev1 response, escalation, and recovery
Own RCA and drive systemic fixes (not point fixes)

Introduce AI-enabled pipeline optimization and quality gates

Embed AI into monitoring, risk prediction, and CI/CD optimization
Drive automation to reduce operational toil and improve decision-making

Required Skills:

Bachelors degree in computer science, Engineering, or a related field.
6-10 years of hands-on experience in software operations, DevOps and Site Reliability Engineering, including managing large-scale, mission-critical systems.
Clear and confident communication skills with ability to lead teams and collaborate effectively across engineering, product, and architecture teams.
Proven track record ensuring high availability and performance in production environments, with expertise in fault-tolerant, distributed system design.
Excellent understanding of modern software delivery pipelines and DevOps practices, including CI/CD, configuration management, and version control (Git).
Exceptional problem-solving skills, with experience diagnosing complex system issues under pressure and driving them to resolution.
Strong proficiency in at least one programming or scripting language (e.g., Python, Bash, or Java) for automation and tool integration.
Self-driven and proactive, with a passion for automating manual processes and continuously improving systems to enhance reliability and team productivity.

Key Skills and Experience:

Proven experience:

Releasing into and running mission-critical, high-availability SaaS platforms
Technically leading a Platform team and influence stakeholders and vendors.
Stakeholder engagement across Product, Architecture, and Operations

Deep expertise in:

Site Reliability Engineering (SLI/SLO, error budgets, incident management)
DevOps operating models and platform engineering (engineering transformation)
CI/CD architecture and release automation
Cloud, Systems & Infrastructure (DB2, Oracle, Infinispan, OpenLiberty)
Automation-first engineering with proven usage of AI (self-healing, triage)
Java application platforms and runtimes (performance tuning, troubleshooting, production operations)

Strong experience with:

Cloud platforms (Azure preferred)
Distributed systems and fault-tolerant architectures
Performance Tuning and Scaling
Database optimisation (DB2, Oracle, PostgreSQL)
Multi-region / active-active environments
Monitoring, logging, tracing frameworks
Experience embedding reliability practices into the SDLC

Hands-on with:

DB2, Oracle, Infinispan, OpenLiberty, Azure
Infrastructure as Code (Terraform or similar)
Containerisation and orchestration (Docker/Kubernetes)

Work Environment

This is aremote-first role, collaborating daily with global teams across engineering, product, architecture, and DevOps.TheSRE/DevOpsLeadEngineerwill interact with colleagues across multiple time zones and must occasionally flex working hours to ensure smooth handoffs and incident coverage. Participation in an on-call rotation is expected as part of our commitment to 247 supportofa clinical-grade platform. We are a fast-paced,collaborative environmentthat values continuous learning, proactive problem-solving, and the sharing of ideas. Minimal travel may berequiredfor periodic team on-sites or company engineering summits.

Compensation

The salary range provided in this job posting is intended to reflect the general market value for the position. The actual salary offered may vary based on factors such as the candidates experience, qualifications, skills, and the specific requirements of the role. This range may also be subject to change as market conditions evolve. We encourage open communication throughout the interview process to discuss compensation expectations. For base-salary + commission sales roles, the range represents On-Target Earnings.

Min 6 Max :

$131,381.86 - $197,072.78 (USD)

Benefits

The benefits described represent the current offerings at our organization, however, benefits are subject to change and may vary by location and employment status. We strive to provide a comprehensive benefits package that supports our employees health, wellness, and financial goals. Please note that benefits may be discussed in more detail during the hiring process.

Remote first / work from home culture
Flexible vacation to help you rest, recharge, and connect with loved ones
Paid leave benefits
Health, dental, and vision insurance
401k retirement savings plan
Infertility benefits
Tuition reimbursement, life insurance, EAP 6 and more!

* Ladders Estimates

Similar Jobs

Manager, Cloud Support Operations
$98K — $147K *
OpenText
Waterloo, ON N2J 1A1
Today
Manager, Cloud Support Operations
$98K — $147K *
OpenText
Richmond Hill, ON L4B 0A5
Today
Manager, Deployment Engineering - hybrid D.C area (DevOps/DevSecOps/Site Reliability/Deployment Engineering)
$195K — $300K *
Legion Intelligence
Washington, DC 20011 (District Of Columbia County)
Reposted Yesterday
Software Engineering Manager--Production Support Operations
$110K — $140K *
Truist Financial
Charlotte, NC 28269 (Mecklenburg County)
2 days ago
Software Engineering Manager--Production Support Operations
$120K — $150K *
Truist Financial
Atlanta, GA 30349 (Fulton County)
2 days ago
Team Lead, Site Reliability Engineer
$120K — $150K *
TeamViewer Germany GmbH
Austin, TX 78704 (Travis County)
Reposted 3 days ago

Get Ready For Your
Next Interview

More Jobs at Merative

Site Reliability and DevOps Engineering Lead
$131K — $197K *
Remote
Today
Information Technology
Remote in United States
New Business Development Executive
$226K — $340K *
Remote
Reposted Yesterday
Healthcare
Remote in United States
Senior Director, Product Management - MarketScan
$216K — $324K *
Remote
Yesterday
Healthcare
Remote in United States
Curam Technical Consultant
$121K — $182K *
Remote
2 days ago
Information Technology
Remote in United States
Sr Manager, Product Safety Risk Management- Medical Device
$145K — $218K *
Remote
4 weeks ago
Healthcare
Remote in United States

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
4 days ago
Information Security Officer
$110K — $130K *
Klohn Crippen Berger LLC
Vancouver, BC V5K 5J9
Reposted Today
GPU Physical Design Engineer
$120K — $150K *
Apple
Austin, TX 78745 (Travis County)
Reposted Today
Web Applications Developer
$75K — $95K *
Swire Coca-Cola, USA
Draper, UT 84020 (Salt Lake County)
Reposted Today
Senior IGA Architect / Consultant
$100K — $130K *
Indigo Consulting
Montreal, QC H1A 0A1
Today

Find similar Site Reliability and DevOps Engineering Lead jobs:

Nationwide Remote

Site Reliability and DevOps Engineering Lead

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Site Reliability and DevOps Engineering Lead jobs:

Get Ready For Your
Next Interview