industry-logo

Sr. Site Reliability Engineer 0225

nexus IT group

$120K — $160K *

US-AnywhereRemote in Remote, OR

clock 3 weeks ago

compensation-icon

5 - 7 years of experience

bookmark empty
report an issue with job

Job Description

Responsibilities:
  • Design, implement, and maintain highly resilient and secure infrastructure for our SaaS platform using AWS services, including API Gateway, Lambda, Aurora Serverless, OpenSearch Serverless, Secrets Manager, and FusionAuth
  • Ensure best-in-class security of the application using AWS security services such as WAF, Shield, GuardDuty, and implement industry-leading security practices
  • Develop, implement, and maintain robust monitoring and alerting solutions to ensure the reliability and performance of our SaaS platform, including the use of CloudWatch, Prometheus, Grafana etc.
  • Facilitate and drive incident response, triage & resolution, and retrospective/root cause analysis to maintain the reliability and uptime of our platform
  • Lead incident post-mortem/retrospectives to surface reliability improvements and drive to completion
  • Implement strategies to increase system resilience and performance through on-call rotation and process optimization
  • Strong understanding of SRE principles, including error budgets, SLOs, SLIs and SLAs, including the ability to identify and establish them for the team
  • Build and maintain infrastructure as code using Terraform
  • Provide input and expertise for system architecture and feature development
  • Engage and collaborate with stakeholders including Product, Development, QA, Customer Success & others to ensure work is properly defined, prioritized and executed, including improvements & future initiatives
  • Educate and guide Engineering teams on best practices wrt reliability, resiliency, security, etc
  • Participate in the Agile Development lifecycle helping us to stay realistic on our goals and flexible in our execution
  • Foster a culture of group collaboration while being effective at working independently at the same time


Requirements:
  • Prior SRE experience supporting a cloud-native SaaS platform with AWS
  • Bachelor's degree in Computer Science, Software Engineering, or a related field (or equivalent work experience)
  • AWS Solutions Architect and/or AWS DevOps Professional Certifications
  • A self-starter with strong communication skills, written and verbal, and prior experience thriving in a distributed work environment
  • 5+ years of hands-on experience in site reliability engineering roles
  • Expert knowledge of AWS services, specifically API Gateway, Lambda, Aurora Serverless, OpenSearch Serverless, Secrets Manager, and FusionAuth
  • Expertise in AWS security services, including WAF, Shield, GuardDuty, and a deep understanding of cloud security practices
  • Strong experience with monitoring and alerting tools such as CloudWatch, Prometheus, Grafana, or similar
  • Proven ability to design and implement effective monitoring strategies to ensure system reliability and performance
  • Willingness and availability for participation in a 24x7x365 on-call rotation, ensuring prompt and effective responses to business-critical alerts outside of regular working hours
  • Extensive experience with Terraform for infrastructure as code
  • Experience building, securing, and maintaining a multi-tenant SaaS application
  • Experience with IDPs such as FusionAuth, Okta, Auth0, or similar
  • Strong understanding of information security principles and practices

More Jobs at nexus IT group

$130K — $180K *

Yesterday

• 5 - 7 years exp

Information Technology

Remote

$150K — $200K *

1 week ago

• 5 - 7 years exp

Information Technology

Remote

$80K — $120K *

3 weeks ago

• 5 - 7 years exp

Information Technology

Remote

$80K — $120K *

3 weeks ago

• 5 - 7 years exp

Information Technology

Remote

$80K — $130K *

3 weeks ago

• 5 - 7 years exp

Information Technology

Remote

Find similar jobs: