This full time, permanent position is based out of The RealReal HQ in San Francisco and reports to the Director of Technical Operations. As part of the DevOps team, you will help scale our online services reliably through spikes in traffic and infrastructure failures. Major projects may include platform migrations, disaster recovery deployments, and more. Bring your thorough, practiced understanding of DevOps and help shape The RealReal's production infrastructure and Engineering culture.
DUTIES & RESPONSIBILITIES
- Design, build and evolve our production infrastructure, strategically employing automation, and infrastructure-as-code. Satisfy High Availability and Disaster Recovery requirements.
- Troubleshoot complex performance and scaling issues, working with Engineering to ensure that we avoid bottlenecks and scale to meet traffic demands through organic growth and marketing events.
- Write and perform load testing in order to validate scalability, evaluate improvements, or troubleshoot scaling issues.
- Collaborate efficiently and effectively with Engineers and Product teams on complex problems involving functionality and scaling/performance. Drive ad-hoc troubleshooting teams towards solutions and proper rollouts.
- Advocate for and ensure compliance with 12 Factor methodology in our apps.
- Quickly absorb context and tribal knowledge while ramping up and using that to build or bolster documentation. Understand and coalesce information sources for effectiveness.
- Keep a strong level of quality and velocity in your work, while collaborating and reporting when appropriate.
- Exercise and promote security best practices throughout your workflow.
- Participate in an on-call rotation on a regular basis and respond to incidents reliably and professionally.
- 5+ yrs DevOps or Systems Administration experience.
- 3+ yrs experience with Cloud Infrastructure (AWS, GCP).
- 2+ yrs Automation experience using popular languages (bash, python, etc).
- 2+ yrs professional experience with UNIX-based Operating Systems.
- Experience tuning and troubleshooting performance for high traffic web services.
- Experience tuning database performance and experience with MySQL or PostgreSQL.
- Proficient with crafting concise and professional communications during emergency production infrastructure incidents.
- Strong understanding of common network protocols, including HTTP, HTTPS, TCP, SSL/TLS, and relevant diagnostic tools.
- Understanding of git and Github workflows.
- Understanding of packaging, deployment, and support of containerized (Docker) applications.
TO SET YOU APART
- Experience with performance tuning in microservices environments.
- Experience converting applications to run in Docker containers, and with orchestration layers.
- Experience using terraform with multiple providers and/or integrated with a build/release system.
- Experience in software development.
- Computer Science or Engineering degree.