Your RoleAs a Storage Infrastructure Engineer, you'll take everything we know about modern databases and apply it to the world of Physical AI. Our warehouse co-indexes video, sensors, embeddings, and sim outputs on the same row, versioned, with a third query layer (not row/column, not vector/semantic) - content-aware queries over what's inside clips. Your job is to make that layer fast: the right indices for petabyte-scale video, predicate pushdowns that elide whole files, file formats that respect random access into clips, and a query path that turns "left-arm grasp failures on deformable objects" into the smallest possible read.
You should believe, in your bones, that
the best read is the read elided.Key Responsibilities- Design and build the storage and indexing layer: row groups, column chunks, secondary indices, vector indices, and the metadata that lets queries skip everything that doesn't matter.
- Push the query engine harder - predicate pushdown, projection pushdown, late materialization - across multimodal columns including video, embeddings, and sensor streams.
- Choose, extend, or build on top of modern open formats (Parquet, Iceberg, Delta etc) and build our own/contribute upstream where it makes sense.
- Build versioning and schema evolution for multimodal datasets so customer data stays reproducible across months of experimentation.
- Partner with the Dataloading team on the format-to-loader boundary so an iceberg.scan(...) translates into the absolute minimum of bytes hitting NVMe.
- Partner with the Visual Understanding team to land model outputs in the index without an external glue layer.
What we look for- You love thinking about indices. B+ trees, LSM trees, bitmap indices, vector indices, learned indices - you have favorites and you have grudges.
- You love thinking about query engines. Predicate pushdown makes you happy. Late materialization makes you happier.
- Strong familiarity with the storage hierarchy: cloud object stores, NVMe, block storage, spinning disk, RAM, GPU memory - and the latency and cost of moving between them.
- Strong opinions about Parquet - love it or hate it, you've earned the opinion. Same for Iceberg, Delta, Lance, and the other lakehouse formats.
- A real love for databases and query systems. You read database papers for fun.
- You believe the best read is the read elided.
Nice to have- Background from a storage or table-format team - Lance, Iceberg, Delta, Hudi, Spiral, Snowflake, BigQuery, Databricks Photon, DuckDB, ClickHouse, or similar.
- You've attempted to build your own database before. Or, at minimum, fantasized about it in detail.
- Experience with Rust or modern C++ for storage engines.
- Hands-on time with vector indices (HNSW, IVF, SCANN) or hybrid retrieval systems.
- Comfort with the OLAP/lakehouse ecosystem: catalogs, file layout, compaction, manifest formats, time travel.
Perks & Benefits- In-person, tight-knit team - 4 days/week in our SF Mission office.
- Competitive comp and meaningful startup equity.
- Catered lunches and dinners for SF employees.
- Commuter benefit.
- Team-building events and poker nights.
- Health, vision, and dental coverage.
- Flexible PTO.
- Latest Apple equipment.
- 401(k) plan with match.
If you've ever read a Parquet footer for fun and thought "this is so close to what video needs, but yet so far" - we should talk.