Job DescriptionRenaissance is looking for an experienced Sr Site Reliability Engineer to be part of the Engineering Enablement group's Site Reliability Team with a focus on Application and Infrastructure Availability, Reliability, Observability & Security.
We are at the crossroads of evolving our current team and looking for someone who has been involved in the SRE implementation journey at other companies. We are looking for someone who influences our SRE philosophy and practices, who is a problem solver, self-motivated, great at communication, values teamwork. You will apply your technical expertise to build and scale our highly available distributed SaaS platform used by millions of K-12 students worldwide.
In this role as a Sr Site Reliability Engineer, you will- Work with engineering, security & governance teams to improve observability, reliability, resiliency, auditability of our systems and minimize/prevent downtime.
- Contribute to infrastructure-as-code using Terraform & CloudFormation.
- Support CI/CD pipelines which ensures the prompt release of high-quality software.
- Collaborate with cross-functional teams to resolve infrastructure issues.
- Perform Disaster Recovery exercises on our products.
- Explore and integrate AI tooling into the SRE workflows.
- Be part of an on-call rotation & support off hour incidents & deployments.
- Demonstrates strong skills in giving constructive feedback through coaching even without direct reports.
For this role as a Sr Site Reliability Engineer, you must have:- 5+ years of experience focused on SRE.
- Experience in managing & monitoring containerized cloud environments in production, preferably AWS EKS.
- Experience with IaC, Configuration Management and Orchestration Tools like Terraform/Docker/Ansible.
- Hands-on experience in any of the programming or scripting languages like .NET/Java, Python, Javascript etc .,
- On Call experience & willingness to be on call during non-work hours and weekends.
- Experience working in an agile environment.
Bonus points for: - BS in Information Systems or Computer Science, related field experience, or both.
- Managing Kubernetes Clusters, EKS at Scale using Helm.
- Setting up Gitlab & Github pipelines & workflows.
- Experience setting up Monitoring, Logging, Alerting & Observability in tools such as NewRelic, Datadog, Grafana. CloudWatch, PagerDuty.
- Experience w/Teleport, Hashicorp Boundary etc.,
- Experience w/RedShift, OpenSearch/ZeroETL.
- Experience running Disaster Recovery exercises.
- Implementing service level objectives (SLO/SLI/SLA's) & error budgets.
- Experience using ClaudeCode using agentic coding, agentic SDLC, enabling/rolling-out agentic DX.
Additional InformationAll your information will be kept confidential according to EEO guidelines. #LI-Remote
The below compensation range is based on national market data and may vary by experience and location. Salary Range
$109,500-$150,550 USD
Benefits for eligible US employees include:- World Class Health Benefits: Medical, Prescription, Dental, Vision, Telehealth
- Health Savings and Flexible Spending Accounts
- 401(k) and Roth 401(k) with company match
- Paid Vacation and Sick Time Off
- 12 Paid Holidays
- Parental Leave (20 total weeks with 14 weeks paid) & Milk Stork program
- Tuition Reimbursement
- Life & Disability Insurance
- Well-being and Employee Assistance Programs
Benefits listed apply to eligible U.S. employees in accordance with Renaissance's benefits eligibility criteria. Contractor and other non-employee roles are not eligible for Renaissance employee benefits.Frequently cited statistics show that some women, underrepresented individuals, protected veterans and individuals with disabilities may only apply to roles if they meet 100% of the qualifications. At Renaissance, we encourage all applications. Roles evolve over time, especially with innovation, and you may be just the person we need for the future!