Berkeley Research Group

Site Reliability Engineer

Berkeley Research Group$130K — $160K *
US-AnywhereRemote in United States
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in computer science or a related field.
  • Five years' experience in a site reliability engineer role or similar position.
  • Strong programming skills in languages like Golang, Ruby, or Python.
  • Expertise in Kubernetes and cloud-native infrastructure management.
  • Experience with CI/CD platforms (e.g., GitHub Actions, GitLab CI).
  • Proficient in using observability and incident management tools like Datadog or PagerDuty.
  • Excellent communication skills and problem-solving abilities.

Responsibilities

  • Design, implement, and maintain reliable systems in Azure Cloud Services.
  • Provide operational support for full-stack software applications and troubleshoot issues.
  • Automate processes to enhance system resilience and self-healing capabilities.
  • Develop service-level indicators and objectives for release validation automation.
  • Collect performance metrics and ensure reporting to stakeholders on system performance.
  • Manage system maintenance in cloud and database environments, addressing production issues.
  • Lead incident management and respond to outages and service disruptions.

Benefits

  • Opportunities for professional development and advanced training.
  • Flexible work arrangements promoting a balanced lifestyle.
  • Access to the latest technologies and tools in the industry.
  • Collaborative work environment with cross-functional teams.
  • Engagement in innovative projects with a real impact on system reliability.
Full Job Description
We are seeking a Site Reliability Engineer to design, build, and maintain highly available systems and infrastructure. The SRE will work closely with software developers and operations teams to improve system reliability, automate processes, and minimize downtime. Responsibilities • Design, implement, and maintain scalable and reliable systems in cloud environments such as Azure Cloud Services. • Experience with CI/CD Platforms (GitHub Actions, GitLab CI) • Provide operational support for full-stack software applications. • Increase system resilience with expert-level coding, bulletproof release, and change management skills. • Develop service-level indicators and objectives to automate release validation. • Improve automation and increase the system's self-healing capability. • Collect operating system data and report performance metrics to stakeholders. • Ensure security best practices are followed in cloud infrastructure and application deployments. • Manage cloud and database system maintenance, debugging production issues as they arise. • Improve reliability, quality, and time-to-market of our suite of software solutions. • Partner with security and product teams to define and publish policies, processes, and playbooks to facilitate rapid and effective handling of alerts and incidents. • Lead incident management processes; respond to outages and service disruptions promptly. Qualifications: • Bachelor's degree in computer science or similar field. • Five years' experience as a site reliability engineer or similar role. • Strong programming skills (Golang, Ruby, Python, or similar) • Proven ability to diagnose and monitor performance and reliability issues across the stack. • Expertise in Kubernetes. • Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation. • Proven experience working with cloud-native infrastructure (Azure Cloud Services, AWS, or GCP). • Experience working with observability and incident management tools (Datadog, OpsGenie, PagerDuty). • Experience scripting operating system tasks with Infrastructure as Code. • Impeccable communication skills. • Ability to problem-solve in a fast-paced, high-stakes environment. Candidate must be able to submit verification of his/her legal right to work in the United States, without company sponsorship. Salary: $130,000 - $160,000

Similar Jobs

More Jobs at Berkeley Research Group

More Information Technology Jobs

Find similar Site Reliability Engineer jobs: