Site Reliability Engineer, Team Lead

Omnicell • $120K — $150K *

Cranberry Township, PA 16066In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor's degree in Computer Science, Engineering, or a related field OR equivalent experience
7+ years in software or platform engineering, with 4+ years in SRE, DevOps, or platform reliability
2+ years of technical leadership experience with mentorship responsibilities
Proven track record in cloud-native production environments
Strong experience with Infrastructure as Code (Terraform preferred) and CI/CD pipelines.

Responsibilities

Establish and operate an SRE function balancing hands-on engineering with leadership duties
Define SLIs, SLOs, and error budgets for top customer-facing services
Design and implement an incident command structure and on-call model
Select and operationalize the primary observability platform
Lead hands-on engineering for Tier-1 services through incident management and infrastructure as code.

Benefits

Collaborate with various departments to align reliability with business goals
Opportunity to shape the SRE function's future in a cloud-first environment
Engagement in continuous learning and growth mindset culture
Influence the operational standards and practices at Omnicell
Involvement in mentorship and coaching as an integral part of the leadership role.

Full Job Description

Job Description

Site Reliability Engineer, Team Lead (Player-Coach)

Location: Preferred for candidates to be local to (U.S.) Austin, TX or Cranberry Woods, PA, but open to fully remote in mainland USA.
Department: Global Cloud Operations
Reports to: Vice President, Global Cloud Operations

Visa Sponsorship: Any form of Visa Sponsorship is not offered for this position. Must be US citizen or Permanent Resident.

What You'll Do

Purpose: Establish and operate Omnicell's Site Reliability Engineering function, balancing hands-on engineering with practice design, coaching, and cross-functional leadership.

Primary Impact

You will ensure Omnicell's Tier-1 cloud services are observable, resilient, and dependable-so hospitals, pharmacies, and clinicians can rely on our platform without interruption.

Reliability Practice & Operating Model

Define and publish SLIs, SLOs, and error budgets for the top 5-10 Tier-1 customer-facing services in partnership with Product and Engineering.
Design Omnicell's incident command structure, including severity definitions, declaration criteria, war-room protocols, stakeholder communications, and post-incident review standards.
Establish and operationalize a sustainable on-call model, including fair rotations, paging discipline, escalation paths, and coordination with managed service partners (IBM, HCL).
Partner with the VP to migrate the interim incident response RACI - currently held by matrixed individuals across IT, Engineering, Support, and Enterprise Security - into a durable SRE-owned model.
Select and stand up the primary observability platform, preferring extension of existing Omnicell contracts (DataDog, IBM/Instana, Prometheus/Grafana, OpenTelemetry, or other tooling already in use) over net-new procurement. Define the instrumentation standards all new services must meet.
Develop and track operational KPIs (e.g., MTTR, SLO attainment, change-failure rate, incident recurrence, cost per workload) and present reliability insights and roadmaps in executive Cloud Ops reviews.

Hands-On Engineering & Incident Leadership

Instrument Tier-1 services directly-building dashboards, alerts, and runbooks yourself.
Participate in on-call rotations and command Sev-1 and Sev-2 incidents, leading blameless postmortems and driving corrective actions to completion.
Contribute production code and infrastructure-as-code (Terraform preferred) to the platform. Oversee the design and evolution of the CI/CD pipelines - current stack is Codefresh, Teamcity, Github Actions, and Octopus Deploy, and we are consolidating over time
Administer and scale our Kubernetes platform, including secure and compliant cluster configurations. Working knowledge of Docker, Helm, and Service Mesh (Istio or Linkerd) expected.
Plan and execute chaos and failover exercises to validate real-world resilience.

AI-Driven Operations

Architect Omnicell's AIOps strategy, evaluating ML-based anomaly detection, alert correlation, automated root-cause analysis, and LLM-assisted runbooks.
Make disciplined build-versus-buy decisions and integrate AI tooling only where it delivers measurable reliability gains.
Ensure AI-assisted operations meet auditability, explainability, and compliance requirements (HIPAA, SOC 2).

Coaching & Team Building

Serve as formal coach to an Engineer III SRE, pairing on incidents, reviewing designs proposals, and supporting growth toward senior levels.
Design the next 2-4 SRE hires, including role definitions, interview loops, and hiring decisions.
Represent SRE in architecture reviews, launch readiness assessments, and cross-functional reliability discussions.

What Success looks like in the first six months

Concrete outcomes this role will be evaluated against in the first half-year. These are drawn from the Cloud Ops 90-day plan and its extension into the following quarter.

Month 1: SLOs drafted for the top 5 Tier-1 services with Product sign-off. Severity rubric published. First live tabletop Sev-1 run against the interim RACI.
Month 2: Observability platform selection finalized. Instrumentation standard published. Engineer III SRE hired and onboarded.
Month 3: On-call rotation live. First real Sev-1 commanded under the new structure with a blameless postmortem completed and follow-ups tracked.
Month 4-6: Error budget policy in effect for the first 3 services. First incident review at executive level. Interview loop running for the next SRE hires. Initial AIOps evaluation and pilot scope defined.

Who You Are

Bachelor's degree in Computer Science, Engineering, or a related technical field OR equivalent experience
7+ years of experience in software or platform engineering, with at least 4 of those in an SRE, DevOps, or platform reliability role.
At least 2 years of formal technical leadership, tech-lead, or staff-level experience with mentorship responsibilities.

Preferred Qualifications

Proven experience leading SRE, DevOps, or platform engineering teams in a cloud-native production environment - with demonstrated experience building a practice from zero or near-zero: you have set SLOs, defined incident command, and introduced error budget thinking to an organization that did not have it.
Deep hands-on expertise with at least one major public cloud (AWS, Azure, or GCP), including networking, IAM, and managed services.
Strong background in CI/CD pipeline design and management (familiarity with CodeFresh, GitHub Actions, Jenkins, TeamCity, or equivalent).
Experience implementing Infrastructure as Code using Terraform (preferred), Chef, Puppet, or similar tools.
Proficiency in Python or another object-oriented programming language for automation, tooling, and production services.
Experience administering and scaling Kubernetes clusters, including secure and compliant platform configurations. Working knowledge of Docker, Helm, and Service Mesh technologies (Istio, Linkerd).
Hands-on experience designing modern observability platforms using tools such as DataDog, Prometheus, Grafana, OpenTelemetry, Elasticsearch/Kibana, or equivalent - with an opinion about what a good telemetry stack looks like.
Familiarity with integrating AI/ML-based anomaly detection, alerting, or LLM-assisted triage pipelines - or strong conviction about where AIOps should and should not be applied in a regulated environment.
Real incident command experience for customer-impacting Sev-1 events, with blameless postmortem practice and documented follow-up discipline.
Ability to coach and mentor, with direct evidence of growing junior and mid-level engineers. You will eventually have 1 direct report.
Comfort operating in a regulated environment where reliability and compliance (HIPAA, SOC 2) are inseparable.

How You'll Elevate at Omnicell

At Omnicell, success is defined by both outcomes and behaviors. In this role, you will:

Collaborate: Partner deeply with Product, Platform Engineering, Support, Security, and managed service providers to align reliability with business priorities.
Inspire: Lead by example during high-stakes incidents and influence teams toward a culture of ownership, learning, and resilience.
Develop: Invest in the growth of your SRE peers through coaching, pairing, and thoughtful technical leadership.
Execute: Set clear priorities, make informed trade-offs, and deliver durable reliability improvements.
Impact: Shape how Omnicell operates for years to come by defining the standards, tools, and practices of our SRE function.

Leadership Imperatives (Player-Coach Role)

This role will eventually have one less senior Site Reliability Engineer reporting to you, you are expected to demonstrate Omnicell's leadership expectations by:

Modeling a growth mindset and continuous learning.
Acting as a talent activator through formal coaching and mentorship.
Being an impact maker who connects reliability investment to business and patient outcomes.
Serving as a change champion as Omnicell transitions to cloud-first operations.

#LI-MG2

About Omnicell

Omnicell, Inc. is an American multinational healthcare technology company headquartered in Mountain View, California. It manufactures automated systems for medication management in hospitals and other healthcare settings, and medication adherence packaging and patient engagement software used by retail pharmacies. Its products are sold under the brand names Omnicell and EnlivenHealth.

Learn more about Omnicell

Size

3,800 employees

Market Cap

$2 billion

Industry

Technical Services

Net Income

$32.1 million

Founded

1992

5 Year Trend

+10.2%

Revenue

$892.2 million

NASDAQ

OMCL

* Ladders Estimates

Similar Jobs

IT Systems Engineering Manager
$105K — $231K *
CACI International
Washington, DC 20011 (District Of Columbia County)
Reposted Today
IT Systems Engineering Manager
$105K — $231K *
CACI International
Washington, VA 22747 (Rappahannock County)
Today
IT Systems Engineering Manager
$105K — $231K *
CACI International
Chantilly, VA 20152 (Loudoun County)
Reposted Today
Software Engineering Group Manager - Site Reliability Center
$146K — $298K *
The PNC Financial Services Group, Inc
Pittsburgh, PA 15237 (Allegheny County)
Today
Senior Technical Fellow - Edge Optimization Architecture
$150K — $180K *
Stellantis
Auburn Hills, MI 48326 (Oakland County)
Yesterday
IT Supervisor (APPLICATIONS PRG SUPV 2)
$102K — $202K *
Berkeley University of California
Avis, PA 17721 (Clinton County)
Yesterday

Get Ready For Your
Next Interview

More Jobs at Omnicell

Sr. Manager, Product Management - Enliven Health - Clinical Solutions
$120K — $150K *
Fort Worth, TX 76137 (Tarrant County)
Reposted Today
Healthcare
In-Person
Implementation Project Manager- Omnicell Specialty Pharmacy Services
$100K — $120K *
Grapevine, TX 76051 (Tarrant County)
Reposted Today
Healthcare
In-Person
Organizational Development Manager
$90K — $120K *
Dallas, TX 75217 (Dallas County)
Reposted Today
Business Services
In-Person
Sr. Product Designer
$100K — $130K *
Austin, TX 78745 (Travis County)
Reposted Today
Healthcare
In-Person
Organizational Development Manager
$90K — $120K *
Austin, TX 78745 (Travis County)
Reposted Today
Business Services
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
Software Test Automation Engineer II – Mobile
$75K — $95K *
Allegion Plc
Carmel, IN 46032 (Hamilton County)
Today
Lead Machine Learning Engineer
$119K — $224K *
Wells Fargo
San Francisco, CA 94112 (San Francisco County)
Reposted Today
Technology Operations Specialist
$81K — $134K *
Bank of America Corporation
Jacksonville, FL 32210 (Duval County)
Reposted Today
Full Stack Software Engineer
$125K — $160K *
Gen II Fund Services LLC
Denver, CO 80219 (Denver County)
Reposted Today

Find similar Site Reliability Engineer, Team Lead jobs:

Nationwide Cranberry Township, PA

Site Reliability Engineer, Team Lead

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Site Reliability Engineer, Team Lead jobs:

Get Ready For Your
Next Interview