The Team You’ll Work With
The Site Reliability Engineering team (SRE) at Carta is responsible for ensuring the availability, reliability, and resiliency of the Carta app and other production systems in various environments. The team has expertise in systems architecture and design, infrastructure automation using Terraform, AWS and Kubernetes. In addition, the SRE team collaborates closely with the Information Security team on defining secure network boundaries and implementing security policies.
The Problems You'll Solve
- Be responsible for meeting various SLAs for multiple database platforms used in the company.
- Improve and optimize database administration and management with coding and automation with a focus on the performance, high availability and reliability.
- Troubleshoot and address performance issues across databases.
- Develop tools to enable self-service and self-managing capabilities of our database infrastructure so that other teams can operate full-stack while rapidly building new features for our customers.
- Write code to capture database performance, and create tools and dashboards to provide actionable insight into that data.
- Participate in system and data tier architecture, performance tuning and help with database development of performance-critical code.
- Architect and implement HA/DR/Backup/Maintenance strategies in the company.
- Collaborate with engineering teams on their database storage needs, and advise them throughout the development lifecycle.
What You'll Need:
- 6-8 years of experience working with Postgres and/or MYSQL running a large fleet of databases at scale.
- Experience with identifying SQL performance issues and tuning them.
- Experience working with the databases in the public cloud AWS or GCP
- Experience automating routine DBA tasks usign scripts or using CI/CD pipelines
- Good knowledge of containerization technologies (specifically, Docker, Kubernetes )
- Automation via "infrastructure as code" using tools like Terraform or any other.
- Knowledge of NoSQL and or other caching databases like Redis
- Good understanding of pgBouncer or any other connection pooler for PostgreSQL.
- GitHub and good understanding of CI/CD tooling.
- Good understanding of streaming infrastructure using Kafka, Kafka Connect.