Cedar has built a platform that combines data science and machine learning to connect patients with healthcare providers in a way that helps tackle the critical challenges of patient billing and payment. Our technology improves the overall experience of patient engagement, enabling providers to thrive in a rapidly changing environment while helping patients understand the cost of their care.
Our team continues to scale, and as part of that growth, we are in need of a Senior Site Reliability Engineer who will play a meaningful role at Cedar. This person will spearhead the development of the SRE function while acting as a senior operator who will positively evolve the SRE organization.
- Contribute to the establishment of a SRE function at Cedar as our first dedicated SRE
- Design scalable and easy-to-use solutions that enable simple management, implementation, and automation of sophisticated infrastructure systems
- Design, build and maintain a framework for monitoring, logging, and alerting on the performance of our internal systems and product
- Partner with product teams to share standard methodologies and issue mentorship around reliability, scalability, performance, and observability of our production systems, infrastructure, and software.
- Build clear and executable runbooks/playbooks to be used by teammates within engineering
- Debugging complex problems across an entire stack and crafting proven solutions
Skills & Experience
- 5+ years' experience with software engineering, software development, or system operations with some of that time in an SRE role
- A passion for reliable, scalable, observable software with a keen sense of ownership
- A real interest in the latest and greatest database, infrastructure, and automation technologies
- Experience with Cloud-based architecture (we use AWS), and Infrastructure as Code (we use Terraform)
- Experience with Python (Java, Go, Rust, or similar will be considered)
- Experience debugging issues between multiples systems/services
- Experience designing, building, and operating large-scale production systems
- Experience with on-call rotations and incident management
- Understands networking and messaging, especially between services
- Has hands-on experience using source control (Git, GitHub) and feature branching strategies
- Experience with a variety of open-source databases (MySQL, Postgres, Redis, Cassandra, etc.)
- Excellent communication skills, both verbal and written
What do we offer to the ideal candidate?
- An opportunity to work on a platform that is scaling very rapidly with 300,000 engaged patients a day as of January 2021 (Up from 50,000 in January 2020!)
- A chance to join a high-growth company at an early stage
- The ability to impact the growth of our company, we value all comments and suggestions
- Transparency across teams and interaction with multiple departments
- Competitive pay, employer-paid healthcare, stock options
- Daily team lunch and unlimited healthy snacks at our NYC office