LiveKit is on a mission to help developers create and scale real-time experiences. We are hiring a
Software Engineer / Site Reliability Engineer to help manage and scale the core components of the LiveKit infrastructure. Reliability and performance of our globally distributed architecture is critical and a top priority.
What You'll Do- Build and own the foundational infrastructure that our products run upon.
- Work directly on our products' golang code base to implement SRE related objectives.
- Take a data driven approach to quantifying system performance and reliability and use it to drive project priorities.
- Oncall participation including leading incident management for complex situations.
- Work on automation and advanced configuration management to allow our team to manage large numbers of clusters distributed across the world running various products.
- Work with infrastructure vendors when their solutions aren't meeting our real time performance and reliability needs.
Who You Are- A balance of strengths in both software engineering and large scale system administration.
- Experience managing complex multi-region distributed systems running on top of container orchestration systems like Kubernetes.
- Passionate about maintainability and keeping system complexity at bay, but able to balance this with meeting launch deadlines.
Bonus Points- Incident management training and experience being an Incident Commander.
- Experience with Linux networking, overlay networks, and Kubernetes CNIs.
- Low level knowledge for troubleshooting and tuning latency sensitive workloads.
Our Commitments to YouWe offer:- A competitive salary and equity package.
- Health, dental, and vision benefits
- Flexible vacations
- Remote work environment with necessary equipment provided.