Staff Software Engineer, Core Infrastructure

Harvey

$201K — $264K *
Enterprise Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 10+ years of experience in Infrastructure or Platform Engineering
  • Expertise in scaling complex distributed systems
  • Deep knowledge of cloud infrastructure (Azure preferred)
  • Proficient in Infrastructure as Code tools like Terraform or Pulumi
  • Strong understanding of Kubernetes and container orchestration
  • Familiarity with observability tools and incident response practices
  • Excellent programming skills in Python, Go, or similar languages

Responsibilities

  • Design and build scalable, fault-tolerant infrastructure systems
  • Own and evolve multi-cloud infrastructure across Azure and GCP
  • Lead initiatives around observability and incident response
  • Architect distributed systems for reliability and load balancing
  • Collaborate with Product Engineering and Security teams
  • Drive infrastructure-as-code practices for reproducible deployments
  • Mentor engineers and enhance technical expertise within the team

Benefits

  • In-person work model to foster team collaboration
  • Relocation assistance available for new employees
  • Opportunity to impact the reliability of a globally used AI platform
  • Involvement in innovative projects at the intersection of AI and infrastructure
  • Mentorship opportunities to grow technical skills and leadership
Full Job Description
Role Overview

As a Staff Software Engineer on the Core Infrastructure team at Harvey, you'll play a critical role in designing and building new infrastructure systems while equally scaling and strengthening our existing infrastructure. Our infrastructure is the foundation that powers every user interaction with Harvey - processing billions of prompt tokens and millions of daily requests across our global legal AI platform.

You'll work in an environment balanced between innovation - building new systems - and operational excellence, ensuring that Harvey remains resilient and efficient as it scales products, regions, customers, and usage. Your contributions will directly impact the reliability, scalability, and security of our platform as we serve the world's leading law firms and professional service providers.

This role is based in New York City, New York. We use an in-person work model and offer relocation assistance to new employees.
What You'll Do
  • Design and build scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions
  • Own and evolve our multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management
  • Lead technical initiatives around observability, incident response, and operational excellence - building systems that enable rapid detection and resolution of issues
  • Architect and optimize our distributed systems for reliability, including load balancing, quota management, and failover mechanisms
  • Partner with Product Engineering and Security teams to ensure our infrastructure is an accelerant, not a constraint
  • Drive infrastructure-as-code practices using tools like Terraform and Pulumi to enable reproducible, auditable deployments
  • Mentor engineers and raise the technical bar across the organization through code reviews, design reviews, and technical leadership


Representative Projects
  • Design and implement a next-generation model proxy architecture that routes millions of daily inference requests while maintaining model API compatibility and enabling seamless model integration
  • Build distributed rate limiting and quota management systems using Redis-backed algorithms to handle bursty traffic patterns without degrading user experience
  • Architect multi-region deployment strategies that meet strict data residency requirements for global enterprise customers
  • Develop comprehensive observability infrastructure with granular SLA monitoring, burn rate alerts, and detailed token attribution for cost tracking
  • Lead the evolution of our CI/CD pipelines to improve developer velocity while maintaining production stability
What You Have
  • 10+ years of experience in Infrastructure Engineering or Platform Engineering in a production environment
  • Long track record building and scaling complex, large-scale distributed systems
  • Deep proficiency with cloud infrastructure platforms (Azure preferred; GCP or AWS experience transfers well)
  • Strong fluency in Infrastructure as Code (IaC) tools - Terraform, Pulumi, or CloudFormation
  • Solid understanding of Kubernetes, container orchestration, networking, and cloud security at scale
  • Experience with observability tools (Datadog, Sentry) and incident response practices (PagerDuty, Incident.io)
  • Strong programming skills in Python, Go, or similar languages
  • Excellent problem-solving skills, a "spidey sense" of where things could go wrong, and a commitment to operational excellence


Nice to Have
  • Experience building infrastructure for AI/ML workloads or high-throughput inference systems
  • Background with distributed rate limiting, load balancing, or quota management systems
  • Experience operating multi-tenant platforms with strict security and compliance requirements
  • Track record of leading complex cross-functional projects and delivering measurable impact


Compensation Range

$201,000 - $264,000 USD

Depending on your location, an Applicant Privacy Notice may apply to you. You can find all of our Applicant Privacy Notices [here].

#LI-AN2

Similar Jobs

More Jobs at Harvey

More Enterprise Technology Jobs

Find similar Staff Software Engineer, Core Infrastructure jobs: