Senior Site Reliability Engineer

Ellucian • $120K — $150K *

US-AnywhereRemote in Virginia, US

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years in Site Reliability Engineering, DevOps, or similar roles
Hands-on expertise with DataDog for APM, logs, metrics, dashboards, and alerting (mandatory)
Experience with cloud platforms like AWS, Azure, or GCP
Proficient in CI/CD and Infrastructure as Code tools such as Terraform
Strong troubleshooting and root cause analysis skills in distributed systems
Familiar with containers and orchestration technologies like Docker and Kubernetes
Scripting or programming experience in Python, Bash, or similar languages

Responsibilities

Own and enhance system reliability, availability, and performance in production environments
Design and manage monitoring and observability using DataDog
Lead incident response and post-incident reviews
Conduct root cause analysis to implement long-term fixes
Collaborate with teams to create scalable and resilient infrastructure
Automate operations to improve efficiency and minimize risks
Analyze and optimize cloud-related costs

Benefits

Comprehensive health coverage including medical, dental, and vision
Flexible time off policy
Thrive Flex Lifestyle Account for health, financial, or learning contributions
401k with matching and financial planning assistance via BrightPlan
Parental leave offered
5 charitable days per year
Telemedicine access
Wellness programs like Headspace Care for mental health and Wellbeats for fitness
Caregiver support through RethinkCare and Wellthy
Diversity and inclusion programs with access to employee resource groups
Employee referral bonuses
Education Assistance Program and professional development opportunities

Full Job Description

About the Opportunity

We are seeking a Senior Site Reliability Engineer (SRE) to ensure the reliability, performance, and cost-efficiency of our production systems. This role requires deep expertise in DataDog for observability and will focus on DevOps practices, incident management, root cause analysis, and cost optimization across cloud infrastructure and services.

Where You Will Make an Impact

Own and improve system reliability, availability, and performance for production environments
Design, implement, and manage monitoring, alerting, and observability using DataDog (required)
Lead incident response efforts, including troubleshooting, mitigation, and post-incident reviews
Perform detailed root cause analysis (RCA) and drive permanent resolutions
Partner with engineering and DevOps teams to build scalable, resilient infrastructure
Automate operational processes to improve efficiency and reduce risk
Analyze and optimize infrastructure and application costs
Define and manage SLIs/SLOs to meet reliability targets
Continuously improve deployment, monitoring, and operational practices

What You Will Bring

5+ years of experience in Site Reliability Engineering, DevOps, or similar roles
Mandatory: Strong, hands-on expertise with DataDog (APM, logs, metrics, dashboards, alerting)
Experience with cloud platforms (AWS, Azure, or GCP)
Proficiency in DevOps practices and tools (CI/CD, Infrastructure as Code such as Terraform)
Strong troubleshooting skills and experience conducting root cause analysis in distributed systems
Experience with containers and orchestration (Docker, Kubernetes)
Scripting or programming experience (Python, Bash, or similar)
Proven ability to analyze and optimize cloud costs

Preferred Qualifications

Experience with cost management tools (e.g., AWS Cost Explorer, Azure Cost Management)
Familiarity with cloud security and compliance best practices
Experience supporting high-availability, customer-facing systems
Strong collaboration and communication skills

What Success Looks Like

Improved system reliability and reduced incident frequency
Faster incident detection and resolution (MTTR)
Effective, actionable observability driven by DataDog
Measurable cost savings and optimized infrastructure usage

Comprehensive health coverage: medical, dental, and vision
Flexible time off
Thrive Flex Lifestyle Account (LSA) that allows you to contribute towards your health, financial or learning interests
401k w/ match & BrightPlan - to help you save for the future
Parental Leave
5 charitable days to support the community that supports us
Telemedicine
Wellness
- Headspace Care (mental health)
- Wellbeats (virtual fitness classes)
RethinkCare & Wellthy- caregiver support
Diversity and inclusion programs which provide access to internal employee resource groups
Employee referral bonuses to encourage the addition of great new people to the team
We Foster a learning culture with:
- Education Assistance Program
- Professional development opportunities

#LI-RB1
#LI-Remote

About Ellucian

Ellucian is a provider of software and services to higher education institutions. The company was founded in 1968 and offers a range of solutions, including student information systems, financial management systems, and analytics. Ellucian's technology is designed to help colleges and universities improve their operations, enhance the student experience, and achieve their strategic goals. The company has a global presence and serves more than 2,700 institutions in over 50 countries.

Learn more about Ellucian

Size

3,000 employees

Industry

Information Technology

Founded

1968

* Ladders Estimates

Similar Jobs

Senior Associate - Site Reliability Engineer
$100K — $143K *
New York Life Insurance Co
Lebanon, NJ 08833 (Hunterdon County)
Today
Senior Operational Technology Engineer
$90K — $120K *
Dairy Farmers of America
Olathe, KS 66062 (Johnson County)
Today
Senior Operational Technology Engineer
$100K — $130K *
Dairy Farmers of America
Dallas, TX 75223 (Dallas County)
Today
Senior Operational Technology Engineer
$100K — $130K *
Dairy Farmers of America
Englewood, CO 80112 (Arapahoe County)
Today
Cloud and Infrastructure Specialist
$100K — $120K *
Stikeman Elliott LLP
Toronto, ON M3C 0E3
Reposted Today
Eng Sr Prin II - Sys
$120K — $150K *
BAE Systems
Sterling, VA 20164 (Loudoun County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Ellucian

Senior Site Reliability Engineer
$120K — $150K *
Remote
Today
Information Technology
Remote in Virginia, US
Executive Assistant to the Chief Financial Officer|Hybrid| Reston, VA
$75K — $95K *
Washington, DC 20011 (District Of Columbia County)
Today
Finance & Insurance
In-Person
Executive Assistant to the Chief Financial Officer|Hybrid| Reston, VA
$75K — $95K *
Reston, VA 20191 (Fairfax County)
Today
Finance & Insurance
Hybrid
Field Marketing Manager - United States Remote
$90K — $120K *
Remote
Today
Business Services
Remote in United States
Associate Manager of Product Communications & Strategic Initiatives |Hybrid| Reston, VA
$70K — $95K *
Reston, VA 20191 (Fairfax County)
3 days ago
Enterprise Technology
Hybrid

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
2 days ago
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
1 week ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Data Engineer
$62K — $141K *
Booz Allen Hamilton, Inc.
Arlington, VA 22204 (Arlington County)
Today
Manager, Cyber Security & Compliance
$90K — $130K *
Four Seasons Yachts
Miami Beach, FL 33139 (Miami-Dade County)
Today

Find similar Senior Site Reliability Engineer jobs:

Nationwide Remote

Senior Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Site Reliability Engineer jobs:

Get Ready For Your
Next Interview