We are building the first version of the Materialize cloud service, and our second site reliability engineer (this could be you!) will play a key role in shaping the evolution of that service and the future of our SRE team. This role could evolve into a tech lead and/or tech lead manager role.
Questions You Can Help Answer
- Do we use Kubernetes or something else to orchestrate the Materialize cloud service?
- How do we foster healthy working relationships between site reliability engineers and software development engineers?
- What is an accurate way to model capacity for Materialize?
- How do we provide high availability and low latency for customers while managing the workload for our employees?
What You Will Do
- Be a core part of the team that develops the first version of the Materialize cloud service and successive versions
- Design, build and maintain the production serving infrastructure
- Develop custom tools, such as those used for automatically scaling deployments for customers and automatic rolling upgrades
- Drive high availability and low latency through a partnership with software engineers
- Participate in an on-call rotation for our production systems and services, driving long-term solutions to production issues
What We’re Looking For
- Professional experience analyzing, monitoring, and troubleshooting large-scale, high-traffic distributed systems
- Expertise with deploying, optimizing, and debugging Linux installations
- Expertise working with AWS
- Experience with one or more of the following languages: Rust, Go, C, C++, Python, Java
- Experience debugging and optimizing complex production systems
- B.S./M.S./Ph.D in a scientific field, or equivalent experience
- Experience with Azure and/or Google Cloud Platform
- Experience with production SQL databases