Senior Site Reliability Engineer

Red Hat • $118K — $195K *

Raleigh, NC 27610In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

4+ years of software engineering experience in cloud environments
Strong proficiency in Go with a focus on production-quality code
Deep hands-on Kubernetes experience, including building operators and controllers
Solid understanding of AWS fundamentals like EC2 and IAM
Proven experience owning production systems with real SLOs and incident response
Ability to ramp quickly on complex systems and make contributions within weeks
Strong systems thinking with a focus on scalability, reliability, and operability

Responsibilities

Contribute production-grade software to upstream open source projects
Bring a systems perspective to architecture decisions for scalability and operability
Operate as a full-stack systems owner and participate in on-call duties
Drive improvements based on operational learning for product enhancements
Design and evolve observability metrics and SLOs
Raise the technical bar through documentation, code review, and knowledge transfer
Work autonomously to identify and lead impactful problems from concept to adoption

Benefits

Highly autonomous and collaborative work environment
Opportunity to contribute to open source projects
Focus on both software engineering and production reliability
Chance to work on a complex, high-scale platform
Engagement with cutting-edge AI-assisted development tools

Full Job Description

Red Hat is seeking a Senior Software Engineer to join the HCP Platform Engineering team, building and operating ROSA (Red Hat OpenShift Service on AWS) Hosted Control Planes (HCP). ROSA HCP is Red Hat's managed Kubernetes platform on AWS, built on a multi-tenant architecture where Red Hat operates shared control plane infrastructure while customers run workloads in their own AWS accounts.

This is not a standard engineering role. You will write and ship production-grade code, contribute to upstream open source projects and take ownership of production systems through on-call. All three are equally important.

You'll work at the intersection of software engineering and production reliability on one of Red Hat's most complex and high-scale platforms. The system spans multiple upstream open source projects and shared, multi-tenant infrastructure, requiring strong engineering judgment and end-to-end ownership.

The team is small, highly autonomous, and trusted to solve meaningful problems-from design through production.

What you will do

Contribute production-grade software to upstream open source projects including HyperShift and OpenShift, owning features end-to-end from design and implementation through deployment and long-term lifecycle in production
Bring a product and systems lens to architecture decisions, ensuring designs account for scalability, operability, and real-world production constraints from the start
Operate as a full-stack systems owner, participating in on-call rotations and taking end-to-end responsibility for diagnosing, fixing, and preventing production issues
Drive improvements that eliminate entire classes of failures by turning operational learning into durable product and platform enhancements
Design and evolve observability (metrics, logs, traces) and SLOs as part of the software lifecycle, ensuring systems are measurable, debuggable, and resilient by design
Raise the technical bar across the team through design docs, code review, pairing, and knowledge transfer during complex engineering work and incidents
Work in a high-autonomy engineering team where you identify the most impactful problems and lead them from concept through implementation and production adoption
Partner as a peer with product and platform engineering teams to influence architecture, challenge assumptions, and ensure systems are built for scale, reliability, and long-term operability
Integrate AI-assisted development tools (GitHub Copilot, Cursor, Claude Code) into daily workflows for design, implementation, and debugging - using human judgment to maintain high engineering standards while increasing delivery velocity and system quality

What you will bring

We're looking for system builders - engineers who design, ship, and own with curiosity, range, and sharp engineering judgment. You go deep in your domain and broad enough across adjacent disciplines to make decisions with full context. You think in systems, communicate with precision, and treat AI as a force multiplier for your craft - not a substitute for your judgment.

4+ years of software engineering experience building and shipping production systems in cloud environments, including microservices, platforms, or distributed systems
Strong proficiency in Go - you write production-quality code, review it critically, and ramp quickly on large, unfamiliar codebases
Deep, hands-on Kubernetes experience from a builder's perspective: you've written operators, controllers, and CRDs in real-world, multi-tenant environments - not just operated clusters others built
Solid understanding of AWS fundamentals (EC2, IAM, networking) and how Kubernetes platforms behave and scale on AWS
Proven experience owning production systems under real SLOs, including participating in on-call and leading incident response with a focus on root cause and long-term fixes
You ramp fast on complex, unfamiliar systems - forming a mental model and making meaningful contributions within weeks
Highly self-directed builder mindset: you identify high-impact problems, propose solutions, and drive them end-to-end without waiting for direction
Strong systems thinking - you naturally connect design decisions to their downstream impact on scalability, reliability, and operability in production
Clear and effective communicator, able to collaborate with engineers on design, architecture, and tradeoffs

Nice to have:

Experience with HyperShift, OpenShift, or ROSA in production environments
Familiarity with multi-tenant Kubernetes challenges such as noisy neighbors, control plane scaling, and fleet-level lifecycle management
Contributions to open source projects, particularly in the Kubernetes ecosystem
Experience designing and operating observability at scale (Prometheus, Grafana, Dynatrace, or similar) across large fleets
Experience leveraging AI-assisted development tools (e.g., coding agents, AI-driven code review, spec-driven workflows) to accelerate development and improve quality

The salary range for this position is $118,600.00 - $195,680.00. Actual offer will be based on your qualifications.

Pay Transparency

Red Hat determines compensation based on several factors including but not limited to job location, experience, applicable skills and training, external market value, and internal pay equity. Annual salary is one component of Red Hat's compensation package. This position may also be eligible for bonus, commission, and/or equity. For positions with Remote-US locations, the actual salary range for the position may differ based on location but will be commensurate with job duties and relevant work experience.

About Red Hat

Red Hat, Inc. is a leading provider of open source software solutions, including Linux, Kubernetes, and Ansible. The company was founded in 1993 and is headquartered in Raleigh, North Carolina. Red Hat operates in over 100 countries and has more than 13,000 employees worldwide. The company is committed to open source innovation and has a strong community of developers and partners. Red Hat was acquired by IBM in 2019 and is now part of IBM's Hybrid Cloud division.

Learn more about Red Hat

Size

13,000 employees

Industry

Enterprise Technology

Founded

1993

* Ladders Estimates

Similar Jobs

Senior Software Test Engineer
$100K — $130K *
Frontier Technology Inc.
Washington, DC 20011 (District Of Columbia County)
Today
Senior Software Engineer (Remote)
$80K — $180K *
The Home Depot
Remote
Today
Senior Software Engineer
$93K — $147K *
Sidley Austin LLP
Washington, DC 20011 (District Of Columbia County)
Today
Senior Python Developer
$120K — $150K *
ManTech International
Herndon, VA 20171 (Fairfax County)
Today
Senior Software Engineer - IBM Sterling (Remote)
$80K — $180K *
The Home Depot
Remote
Today
Senior Mainframe Developer / Technical Lead
$90K — $120K *
Systemtec
Columbia, SC 29223 (Richland County)
Today

Get Ready For Your
Next Interview

More Jobs at Red Hat

Dedicated Operations Technical Account Manager - OpenShift (Secret Clearance Required)
$107K — $172K *
Remote
Reposted Today
Information Technology
Remote in Alabama, US
Senior Product Security Engineer - Cryptography
$131K — $216K *
Raleigh, WV 25911 (Raleigh County)
Reposted Today
Information Technology
In-Person
Senior Site Reliability Engineer
$118K — $195K *
Raleigh, NC 27610 (Wake County)
Today
Information Technology
In-Person
North America Public Sector (NAPS) Customer Success Executive - System Integrators
$189K — $302K *
Remote
3 days ago
Education, Government & Non-Profit
Remote in Maryland, US
Global Events Marketing Manager
$79K — $126K *
Boston, MA 02115 (Suffolk County)
4 days ago
Business Services
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
6 days ago
Systems Engineer Lead
$150K — $180K *
AMERICAN SYSTEMS
Remote
Today
Service Desk Tier 2/3 Support
$93K — $101K *
AMERICAN SYSTEMS
Remote
Today
Linux Administrator
$110K — $145K *
AMERICAN SYSTEMS
Gaithersburg, MD 20878 (Montgomery County)
Today
Service Desk Team Lead
$100K — $126K *
AMERICAN SYSTEMS
Remote
Today

Find similar Senior Site Reliability Engineer jobs:

Nationwide Raleigh, NC

Senior Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Site Reliability Engineer jobs:

Get Ready For Your
Next Interview