Research Engineer, Multimodal Data

Eventual Computing

• $120K — $150K *

San Francisco, CA 94112In-Person

Information Technology

Less than 5 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Strong familiarity with modern vision and multimodal models.
Experience running models at scale using real video and sensor data.
Background from a perception team in self-driving, robotics, or visual-data fields.
Comfortable with cloud infrastructure and large-scale data processing.
Bias toward data-driven infrastructure solutions.

Responsibilities

Own the visual understanding roadmap from model selection to production deployment.
Train, fine-tune, and evaluate various vision models against customer datasets.
Reduce the cost of video annotation by optimizing model selection and processing.
Build and design queryable datasets for customer training use.
Collaborate with dataloading and storage teams for efficient data flow.
Engage directly with researchers for rapid feedback on model iterations.

Benefits

Tight-knit team environment with 4 days/week in a San Francisco office.
Catered lunches and dinners for employees in SF.
Team-building events and social activities.
Health, vision, and dental coverage.
Flexible PTO options available.
Latest Apple equipment provided.
401(k) plan with company match.

Full Job Description

Your Role

As a Research Engineer on the Visual Understanding team, you'll own the layer that makes petabytes of video queryable by content. Physical AI teams have raw footage, lidar, radar, and sim outputs scattered across object stores with no way to find what they need without weeks of human annotation. We change that economics: we run vision-language models over every clip in a corpus along axes the customer cares about (gripper type, failure mode, object class, scene, motion density), so a researcher can ask "left-arm grasp failures on deformable objects" and get a curated dataset in minutes.

You'll define the roadmap for our visual understanding capabilities, train and select the models that make corpus-scale annotation tractable at single-digit cents per hour of video, and build the rich datasets that go on to train customer models. This is a research engineering role - meaning you'll read papers and run experiments, but you ship to production and your work is judged by what it does for customer training runs.

Key Responsibilities

Own the visual understanding roadmap end-to-end: from picking the model family for a customer's taxonomy to landing it in production inference at corpus scale.
Train, fine-tune, and evaluate VLMs, VQA models, embedding models, and convolutional perception models against customer datasets and benchmarks.
Drive down per-clip annotation cost - model selection, distillation, batching, decode pipelining - so "annotate every clip in a 10K-hour corpus" stays economical.
Build the rich, queryable datasets that customers train on: design taxonomies with researchers, instrument quality, version the outputs.
Partner with the dataloading and storage teams so visual understanding outputs flow into the index and on to the GPU without re-engineering.
Work directly with researchers at our partner labs - your shortest feedback loop is their next training iteration.

What we look for

Strong familiarity with modern vision and multimodal models - convolution nets, VLMs, VQA, embeddings - and a sense for the SOTA that's actually deployable today vs. on a leaderboard.
Experience running these models at scale on real video and sensor data, ideally for perception tasks (detection, tracking, segmentation, retrieval, captioning).
Background from a perception team at a self-driving, robotics, or visual-data company - or equivalent depth from a research lab.
Comfortable with cloud infrastructure and large-scale data processing - you don't need to be a distributed-systems engineer, but you've shipped jobs that ran on thousands of GPU-hours of video.
Bias toward data and infrastructure: you reach for "annotate the whole corpus" before "fine-tune another model."

Nice to have

Experience training vision or multimodal models from scratch (not just calling APIs).
ML/AI research background - papers, citations, or a research org on your resume.
Hands-on time with big-data frameworks like Spark, Ray, or Daft.
Worked on embeddings, retrieval, or content-aware search at scale.
Experience designing labeling taxonomies or running annotation programs.

Perks & Benefits

In-person, tight-knit team - 4 days/week in our SF Mission office.
Competitive comp and meaningful startup equity.
Catered lunches and dinners for SF employees.
Commuter benefit.
Team-building events and poker nights.
Health, vision, and dental coverage.
Flexible PTO.
Latest Apple equipment.
401(k) plan with match.

If you're excited about being on the team that turns petabytes of raw video into the training data for the next generation of Physical AI, we'd love to talk.

* Ladders Estimates

Similar Jobs

Research Engineer
$120K — $150K *
Hedra
San Francisco, CA 94112 (San Francisco County)
Reposted Today
Associate Director of Real World Evidence (Pharma Co Experience Required) - Remote US
$114K — $210K *
Syneos Health Careers
Remote
Reposted Today
Bioinformatics Programmer
$90K — $120K *
University of California San Francisco
San Francisco, CA 94112 (San Francisco County)
Today
Senior Real-World Data Analytics Consultant/Senior Data Scientist Consultant (Remote)
$100K — $130K *
ClinChoice
Remote
Today
Principal Scientist, Differential Privacy
$130K — $180K *
FedWriters
Remote
Today
Principal Scientist - AI/ML Specialization - WFH1651
$130K — $180K *
Global InfoTek, Inc.
Remote
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Eventual Computing

Research Engineer, Multimodal Data
$120K — $150K *
San Francisco, CA 94112 (San Francisco County)
1 month ago
Information Technology
In-Person

More Information Technology Jobs

Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
6 days ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Customer Support
Confidential Company
Austin, TX 78701 (Travis County)
2 weeks ago
Software Engineer, GPU Performance
$147K — $211K *
Google
Sunnyvale, CA 94087 (Santa Clara County)
Today
Backend Software Engineer II
$80K — $110K *
U.S. Venture
Appleton, WI 54915 (Outagamie County)
Today

Find similar Research Engineer, Multimodal Data jobs:

Nationwide San Francisco, CA

Research Engineer, Multimodal Data

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Research Engineer, Multimodal Data jobs:

Get Ready For Your
Next Interview