Site Reliability Engineer, Manager

Joint Activities

• $135K — $216K *

US-AnywhereRemote in United States

Information Technology

8 - 10 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

10+ years of experience in site reliability engineering or similar roles in complex, multi-vendor environments.
In-depth knowledge of cloud-native infrastructures and container orchestration (e.g., Kubernetes).
Experience with automation tools like Terraform, Ansible, or Chef.
Proficient in observability technologies such as Prometheus and Grafana.
Strong programming skills in languages like Python or Go for automation.
Expertise in defining SLIs, SLOs, and error budgets.
Excellent communication skills for collaboration across teams.

Responsibilities

Design and implement reliability frameworks, including SLOs and automated incident response systems.
Lead the development of observability platforms using advanced monitoring tools.
Coordinate with vendors and internal teams to manage diverse systems and ensure reliability standards.
Drive incident response strategies and lead root cause analysis.
Mentor engineering teams and advocate best practices in reliability engineering.
Collaborate with product development and security teams for seamless integration of reliability.
Prepare executive-level presentations to communicate technical challenges and business impacts.

Benefits

Opportunities for career advancement within a large-scale cloud ecosystem.
Leadership role with significant autonomy in decision-making.
Collaboration with diverse teams across the organization.
Involvement in strategic initiatives that impact a vast user base.
Support for continuous learning and professional development.

Full Job Description

Responsibilities

Peraton is seeking a Site Reliability Engineer (SRE), Manager- a highly experienced professional responsible for ensuring the availability, reliability, and performance of complex systems in a multi-vendor environment. This role combines deep technical expertise in infrastructure, automation, and system architecture with leadership and collaboration skills to drive reliability frameworks, proactive monitoring, and incident response across diverse platforms and teams.

The Site Reliability Engineer, Manager operates with significant autonomy, architecting solutions that enhance system observability, scalability and fault tolerance. They lead reliability initiatives, mentor engineering teams, and collaborate with multiple vendors and internal stakeholders to align reliability strategies with business objectives and customer needs. This role is ideal for a highly skilled engineer who excels in technical leadership, complex system architecture, and multi-stakeholder environments. Principal Site Reliability Engineers are key to building resilient systems that scale efficiently while minimizing downtime and risk.

This opportunity will support the modernization of a large-scale multi-tenant cloud ecosystem, providing critical enterprise-wide support for more than 40 million users in a complex stakeholder environment. This position requires senior level leadership skills combined with modern cloud and industry leading technical capabilities including product development, strict security compliance, latest technology cloud solutions, reliable application delivery with SaaS and Artificial Intelligence integrations and rapid continuous delivery.

Core Responsibilities

Reliability Architecture and Automation: Design, implement, and oversee reliability frameworks, including SLOs, error budgets, and automated incident response systems. Develop and maintain CI/CD pipelines to ensure seamless deployment and procedural efficiency.
Observability and Monitoring: Lead the creation and enhancement of observability platforms using metrics, logging, and tracing tools. Utilize modern technologies like OpenTelemetry, AI/ML for anomaly detection, and streaming data platforms to proactively detect and resolve issues
Multi-Vendor Collaboration: Coordinate with external vendors and internal teams to integrate and manage diverse systems and tools. Ensure consistent reliability standards and practices are maintained across different technology stacks and service providers.
Incident Management and Risk Mitigation: Drive incident response strategy by leading root cause analysis, post-mortem reviews, and continuous improvement efforts. Identify potential risks and implement mitigation strategies to prevent service disruptions.

Leadership and Collaboration

Technical Leadership: Mentor site reliability and engineering teams, fostering a culture of reliability, automation, and continuous learning. Advocate for best practices in system design and reliability engineering.
Cross-Functional Partnership: Work closely with product development, DevOps, and security teams to integrate reliability into the software development lifecycle. Influence platform strategy and roadmap based on reliability insights.
Strategic Influence: Collaborate with senior stakeholders and vendors on long-term reliability goals. Prepare executive-level presentations that translate technical challenges into business impact.
Agile and DevOps Practices: Lead and refine agile workflows to enhance team productivity and reliability outcomes. Champion DevOps methodologies to align development and cloud services efforts.

**Position could support /work across multiple enterprise- wide efforts within Peraton.**

Qualifications

Key Skills and Qualifications:

Extensive experience (10+ years) in site reliability engineering or related roles, preferably in multi-vendor and complex environments.
Deep knowledge of cloud-native infrastructure, container orchestration (e.g., Kubernetes), and automation tools such as Terraform, Ansible, or Chef.
Proficiency in observability technologies, such as Prometheus, Grafana, OpenTelemetry, log aggregation systems, etc.
Strong programming and scripting skills for automation and tooling (Python, Go, or similar).
Expertise in defining and implementing SLIs, SLOs, and error budgets.
Excellent communication skills for collaboration with diverse teams and external vendors.
Proven ability to lead large-scale reliability initiatives and mentor engineering teams.
Strategic thinker with a focus on aligning reliability engineering with business priorities and customer experience.

Clearance Requirements:

U.S. Citizenship required
Ability to obtain agency clearance (public trust)

Preferred Qualifications:

Top Secret clearance preferred

Target Salary Range$135,000 - $216,000. This represents the typical salary range for this position. Salary is determined by various factors, including but not limited to, the scope and responsibilities of the position, the individual’s experience, education, knowledge, skills, and competencies, as well as geographic location and business and contract considerations. Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay.

* Ladders Estimates

Similar Jobs

A/V Engineering Manager
$142K — $172K *
General Dynamics Information Technology, Inc.
Falls Church, VA 22042 (Fairfax County)
Today
Manager, IT M&A - Due Diligence, Integration, and Separation
$146K — $267K *
KPMG
Atlanta, GA 30349 (Fulton County)
Today
Manager, IT M&A - Due Diligence, Integration, and Separation
$146K — $267K *
KPMG
Atlanta, GA 30349 (Fulton County)
Reposted Today
Environment Architect
$120K — $150K *
Atlas
San Francisco, CA 94112 (San Francisco County)
Today
IT Manager II
$134K — $229K *
Ross Stores
Dublin, CA 94568 (Alameda County)
Today
Sr Mgr, ICT Global System Architecture - Data Connectivity
$120K — $150K *
TE Connectivity
Winston Salem, NC 27107 (Forsyth County)
Yesterday

Get Ready For Your
Next Interview

More Jobs at Joint Activities

Virtual Desktop Systems Engineer
$112K — $179K *
Herndon, VA 20171 (Fairfax County)
Today
Information Technology
In-Person
Scrum Master
$112K — $179K *
Remote
Today
Information Technology
Remote in United States
Active Directory Administrator
$66K — $106K *
Fort Huachuca, AZ 85613 (Cochise County)
Today
Information Technology
In-Person
SITEC - Network Engineer - Little Creek, VA
$104K — $166K *
Virginia Beach, VA 23464 (Virginia Beach City County)
Today
Telecommunications & Hardware
In-Person
Software Developer - Senior
$104K — $166K *
Atlantic City, NJ 08401 (Atlantic County)
Today
Aerospace & Defense
In-Person

More Information Technology Jobs

Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
1 week ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
PHP Web Developer
$90K — $110K *
COMSOL
Burlington, MA 01803 (Middlesex County)
Today
Sr Engineer, Software
$113K — $205K *
T-Mobile
Overland Park, KS 66212 (Johnson County)
Today
Applied Scientist, Materials and Process Development, AWS Center for Quantum Computing
$142K — $193K *
Amazon
Pasadena, CA 91104 (Los Angeles County)
Reposted Today

Find similar Site Reliability Engineer, Manager jobs:

Nationwide Remote

Site Reliability Engineer, Manager

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Site Reliability Engineer, Manager jobs:

Get Ready For Your
Next Interview