Role OverviewAs a Staff Software Engineer on the Site Reliability team at Harvey, you will ensure the reliability, scalability, and performance of our legal AI platform. You'll join a high-leverage team that sits at the intersection of infrastructure and product, owning the systems that keep our platform fast, secure, and always on. From scaling across 50+ regions to automating mission-critical operations, your work will ensure that Harvey remains resilient as we grow. If you're passionate about building robust systems and reducing complexity through automation, we'd love to work with you.
This role is based in San Francisco, CA. We use an in-person work model and offer relocation assistance to new employees.
What You'll Do- Design, implement, and manage monitoring, alerting, and infrastructure resources (compute, storage, networking) across 50+ global regions
- Lead incident management processes, including postmortems, root cause analyses, and driving actionable improvements
- Automate operational tasks and workflows, building tools and processes for capacity planning, graceful rollouts, and safe data access to maintain high reliability and reduce manual intervention
- Establish best practices for security, compliance, and reliability and collaborate across teams to drive these principles throughout the software lifecycle
- Optimize infrastructure costs through strategic capacity planning and build-versus-buy decisions while maintaining system performance, reliability, and functionality
- Provide technical mentorship and leadership, promoting best practices and fostering team growth
What You Have - 10+ years of experience in Site Reliability Engineering or similar roles supporting production environments, with proven ability to mentor and guide technical teams
- Expertise in infrastructure as code(IaC) tools (Pulumi, Terraform, CloudFormation, etc.)
- Deep familiarity with observability tools (Datadog, Sentry, etc.) and incident response practices (PagerDuty, IncidentIO, etc.)
- Proficiency with cloud infrastructure platforms (Azure, GCP, AWS, etc.)
- Strong programming skills (Python, Bash, Go, or similar languages)
- Proven track record of diagnosing complex system problems and implementing durable solutions
- Solid understanding of CI/CD, Kubernetes, containerization, networking, databases, and cloud security principles
- Excellent problem-solving skills, meticulous attention to detail, and a commitment to operational excellence
Compensation Range$238,000 - $290,000 USD
Depending on your location, an Applicant Privacy Notice may apply to you. You can find all of our Applicant Privacy Notices [here].#LI-AN2