Staff Software Engineer, Search & Distributed Systems

ACV

$120K — $150K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 8+ years of software engineering experience, including 3+ years at Senior or Staff level in distributed systems.
  • Deep expertise in Elasticsearch internals, including large-scale cluster management and query optimization.
  • Hands-on experience with Kubernetes, API Gateway architectures, and event streaming technologies like Kafka.
  • Track record of implementing fault-tolerant patterns in microservice architectures.
  • Expert-level skills in observability and performance diagnostics using tools like Datadog and Grafana.
  • Strong communication skills to influence and mentor cross-functional teams.

Responsibilities

  • Design and scale Elasticsearch clusters to handle massive auction throughput.
  • Engineer solutions to prevent failure points, including circuit breakers and backpressure mechanisms.
  • Act as a technical escalation point for diagnosing performance degradation across systems.
  • Manage the data lifecycle, optimizing pipelines between event datastreams and Elasticsearch.
  • Draft architectural Blueprints and SOPs to elevate engineering practices.
  • Evaluate and integrate emerging search technologies to support AI initiatives.

Benefits

  • Emphasis on a people-first culture.
  • Fostering a positive work environment.
  • Supporting trust and transparency among team members.
  • Encouragement of calm persistence in challenging situations.
  • Focus on continuous improvement and innovation.
Full Job Description
The Role

We are looking for a Staff Software Engineer who would thrive on being accountable for our Search infrastructure: its scalability, reliability, and data resiliency. We don't just need someone who knows how to write a complex query; we need a battle-scarred Distributed Systems expert who understands the deep internals of Elasticsearch and who has a deep toolbox for analyzing, monitoring, alerting, and quickly resolving critical issues as they arise.

You know exactly how Elasticsearch fails, why it fails under load, and how to architect a topology that prevents it. Because our search ecosystem doesn't exist in a vacuum, you will also own the architectural connective tissue-ensuring our service layers and event-based ecosystem interact with Search flawlessly.

As a Staff Engineer, you will set the technical standard, drive systemic reliability, and mentor senior engineers across the organization.

What You Will Do
  • Architect for Scale: Design, configure, and scale our Elasticsearch clusters. You will define our global strategies for shard routing, Index Lifecycle Management (ILM), heap tuning, and data tiering to support massive auction throughput.
  • Master the Failure Modes: Anticipate and engineer away points of failure. You will design circuit breakers, implement backpressure mechanisms, and tune asymmetric timeouts to prevent retry storms between our BFFs, K8s services, and the Search layer.
  • Expert Troubleshooting & IR: Act as the ultimate technical escalation point for complex, cross-system performance degradation. You will dive deep into JVM metrics, Garbage Collection pauses, K8s network bottlenecks, and slow logs to uncover and remediate root causes.
  • Holistic System Ownership: Manage the entire data lifecycle. You will optimize the ingestion pipelines syncing our event datastreams driven by producers and consumers (Kafka) to Elasticsearch, ensuring eventual consistency and data integrity at scale.
  • Drive Engineering Excellence: Draft authoritative architectural Blueprints, SOPs, and Runbooks. You will elevate the surrounding engineering culture by coaching teams on distributed systems design, observability best practices, and incident management.
  • Modernize & Innovate: Scan the horizon for emerging technologies. You will help evaluate and integrate next-generation search capabilities (e.g., Vector Search, RAG architectures) to support our broader AI and machine learning initiatives.

What You Bring (Requirements)
  • Experience: 8+ years of software engineering experience, with at least 3+ years operating at a Senior or Staff level focusing on distributed systems and high-throughput platforms.
  • Elasticsearch Mastery: Deep, authoritative knowledge of Elasticsearch internals. You have managed large-scale clusters and deeply understand mapping, analysis, query optimization, cluster state management, and split-brain mitigation.
  • Full-Stack Context: Proficiency in the systems upstream and downstream of Search. You have hands-on experience with Kubernetes (EKS/GKE), API Gateway/BFF architectures, and event streams (Kafka).
  • Resilience Engineering: A proven track record of implementing fault-tolerant patterns (retries, rate limiting, circuit breaking, dead letter queues) in microservice architectures.
  • Observability: Expert-level ability to instrument systems and diagnose complex performance issues using modern observability stacks (Datadog, Prometheus, Grafana, OpenTelemetry).
  • Leadership: Strong communication skills with a proven ability to influence cross-functional teams, build consensus around architectural decisions (the Knoster model!), and mentor mid-level and senior engineers.

Bonus Points
  • Experience with Infrastructure as Code (Terraform, Helm) for stateful applications.
  • Familiarity with FinOps practices, specifically optimizing Elasticsearch compute and storage costs.
  • Experience integrating AI-assisted development tools into your daily workflow.

#LI-AM3

Our Values

Trust & Transparency | People First | Positive Experiences | Calm Persistence | Never Settling

Similar Jobs

More Jobs at ACV

More Information Technology Jobs

Find similar Staff Software Engineer, Search & Distributed Systems jobs: