Software Engineer, Distributed Data Systems

Exa

$130K — $180K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Deep understanding of lakehouse architectures (Delta Lake, Iceberg, Hudi)
  • Experience with large-scale distributed data processing pipelines
  • Hands-on experience with streaming data systems (Kafka, Flink)
  • Familiarity with Ray, Spark, or ClickHouse at production scale
  • Focus on building reliable systems that minimize nighttime issues.

Responsibilities

  • Design lakehouse architecture for handling over 100 PB of data
  • Build streaming pipelines for processing billions of documents daily
  • Architect data layer for embedding training infrastructure using Ray
  • Scale ClickHouse deployment for querying large volumes of search logs.

Benefits

  • Premium healthcare benefits (medical, dental, vision)
  • Fertility benefits
  • 16 weeks of fully paid parental leave
  • Monthly wellness stipend
  • Visa sponsorship for international candidates.
Full Job Description
As a Data Engineer, you'll architect and build the data infrastructure that powers everything we do-from crawling billions of pages to training our embedding models to serving real-time search. You'll have enormous autonomy in designing systems that scale to hundreds of petabytes. If you've ever wanted to build data pipelines at a scale that most companies only dream about, this is your chance.

Who You Are
  • Deep understanding of lakehouse architectures (Delta Lake, Iceberg, Hudi) and when to use them
  • Experience building and operating large-scale distributed data processing pipelines
  • Hands-on experience with streaming data systems (Kafka, Flink, or similar)
  • Familiarity with Ray, Spark, or ClickHouse at production scale
  • An obsessive focus on reliability and building systems that don't page you at 3am
Bonus
  • Experience with Lance or other vector-native storage formats
  • Background in GPU-accelerated data processing (RAPIDS, cuDF)


What You Could Do
  • Design a lakehouse architecture that handles 100+ PB of web crawl data
  • Build streaming pipelines that process billions of documents per day for real-time indexing
  • Architect the data layer for our embedding training infrastructure on Ray
  • Scale our ClickHouse deployment to handle analytical queries across petabytes of search logs


Logistics
  • Location: This is an in-person opportunity in San Francisco.
  • Visas: We're happy to sponsor international candidates (e.g., STEM OPT, OPT, H1B, O1, E3). While we cannot guarantee your visa, we have historically been successful in sponsoring candidates from all over the world. If you receive an offer, our team will work hard to get you a visa.
  • Benefits: We offer premium healthcare benefits (medical, dental, vision), fertility benefits, 16 weeks of fully paid parental leave for all new parents, and a monthly wellness stipend to all of our employees.

Similar Jobs

More Jobs at Exa

  • Chief of Staff
    $150K — $200K *
    San Francisco, CA 94112 (San Francisco County)
    Business Services
    In-Person
  • Head of People
    $150K — $200K *
    San Francisco, CA 94112 (San Francisco County)
    Enterprise Technology
    In-Person
  • Product Marketing
    $120K — $160K *
    San Francisco, CA 94112 (San Francisco County)
    Enterprise Technology
    In-Person
  • Research Engineer, Generalist
    $120K — $160K *
    San Francisco, CA 94112 (San Francisco County)
    Information Technology
    In-Person
  • Software Engineer, Distributed Data Systems
    $130K — $180K *
    San Francisco, CA 94112 (San Francisco County)
    Information Technology
    In-Person

More Information Technology Jobs

Find similar Software Engineer, Distributed Data Systems jobs: