SingleStore

Site Reliability Engineer

SingleStore$90K — $130K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 0-2 years as a Site Reliability Engineer; recent grads welcome
  • Familiarity with infrastructure automation; scripting in Python or Bash preferred
  • Experience with Prometheus and other monitoring tools like Grafana, Mimir, and Loki is a plus
  • Solid understanding of Kubernetes and container technologies
  • Excellent communication and teamwork skills
  • Knowledge of AWS, Azure, or Google Cloud is necessary
  • Experience in debugging and troubleshooting production-level software
  • B.S. in Computer Science or related field

Responsibilities

  • Develop an automation platform for managing infrastructure rollouts across various cloud providers
  • Optimize telemetry systems to identify customer-impacting events
  • Collaborate with engineering teams to enhance service performance in cloud setups
  • Debug live site incidents and conduct postmortem analyses
  • Take part in an SLA-driven on-call rotation including after-hours and weekends

Benefits

  • Collaborative work environment within a globally distributed team
  • Opportunity to work with cutting-edge cloud technologies and tools
  • Engaging in a role aimed at pushing operational boundaries
Full Job Description
Position Overview

SingleStore is seeking a Site Reliability Engineer to help optimize and scale our managed service offering across all three major cloud providers. In this role, you will be at the intersection of leading technology trends - A highly performant distributed database, managed by Kubernetes, running in the cloud. This is a great opportunity to push the boundaries with a cloud-focused SRE role.

This is a development role, requiring an engineering mindset to solve operational challenges. You will be part of a globally distributed team of engineers, helping to drive SRE practices across the company. Through infrastructure automation, you will help us grow our service across multiple cloud platforms. This requires a relentless focus on eliminating manual processes. You will also leverage our monitoring platform to improve the overall customer experience by systematically identifying and fixing any issues impacting our customers. As an SRE, you will also help diagnose issues on the platform, leveraging a deep understanding of the SingleStore query engine along with the backend infrastructure.

Roles and Responsibilities
  • Develop automation platform to manage infrastructure rollouts across cloud providers
  • Optimize telemetry platform to identify customer impacting events while providing relevant data to drive debugging
  • Partner with engineering team to optimize performance of services for cloud architecture
  • Debug Live Site events and conduct follow-up postmortem and RCA analysis
  • Participate in an SLA-driven on-call rotation, which will include after-hours, weekend, and rotating holiday participation.

Required Skills and Experience
  • 0 - 2 years of demonstrated experience working as a Site Reliability Engineer. Recent graduates encouraged to apply.
  • Infrastructure automation experience. Scripting experience (Python, Bash) a plus.
  • Experience with the Prometheus monitoring stack. Experience with Grafana, Mimir and Loki is a plus.
  • Knowledge of Kubernetes and the container ecosystem
  • Strong cross group collaboration and communication skills
  • Familiar with at least one of AWS, Azure, or Google Cloud
  • Experience debugging, diagnosing and troubleshooting complex, production software
  • B.S. Degree in Computer Science or related field

Req ID: ENG00445

About SingleStore

SingleStore is a leading provider of database software. The company's software is designed to help businesses manage and analyze large amounts of data in real-time. SingleStore's software is used by a wide range of clients, including e-commerce companies, financial institutions, and healthcare providers. The company's software is known for its speed, scalability, and reliability. SingleStore is committed to helping its clients make better decisions by providing them with the tools they need to analyze their data effectively.
Learn more about SingleStore
Size
500 employees
Industry
Founded
2011

Similar Jobs

More Jobs at SingleStore

More Information Technology Jobs

Find similar Site Reliability Engineer jobs: