Robotics Data Pipeline Engineer - Multimodal Data

Persona AI

• $90K — $130K *

Houston, TX 77084In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

B.S., M.S., or Ph.D. in Computer Science, Data Engineering, Machine Learning, Robotics, or related field.
Deep expertise in Python and extensive experience with PyTorch.
Experience analyzing complex time-series data from force-torque sensors.
Mastery of video processing pipelines and libraries like OpenCV and FFmpeg.
Hands-on experience with 3D hand tracking and human pose estimation.
Strong understanding of modern imitation learning paradigms and human-to-robot transfer frameworks.
Proven ability to implement data augmentation techniques for computer vision and time-series data.

Responsibilities

Architect efficient data pipelines for high-resolution egocentric video and multi-sensor streams.
Develop algorithms for analyzing force interactions and inferring hidden states.
Create kinematic retargeting algorithms for translating human motion into robot coordinates.
Implement data augmentation strategies to enhance expert trajectories and learning models.
Collaborate with teleoperation teams for aligning human-robot play data with training datasets.

Benefits

Opportunities for professional development and advancement.
Collaborative work environment with innovative teams.
Access to cutting-edge technology and robotics applications.

Full Job Description

Job Title: Robotics Data Pipeline Engineer - Multimodal Data

Department: Software

Reports To: Teleoperations Lead

Employment Type: Full-Time

Location: Houston, TX or Pensacola Fl

About the Role

As a Data Pipeline Engineer, you will architect and scale the data infrastructure that feeds our foundation models. Your primary mission is to extract, augment, and align human dexterous manipulation data from massive complex, multi-sensor and egocentric video datasets. Crucially, you will build advanced post-processing algorithms to perform deep force analysis and infer hidden states from raw data-such as processing direct force-torque outputs to quantify grasp dynamics, estimating contact forces from visual cues, extrapolating heavily occluded hand positions, or deriving 3D geometry from 2D frames. You will use spatial, temporal, and cross-modal data augmentation to multiply the value of every minute of data our teleoperation team collects.

What You Will Be Doing

Multimodal Data Pipelines: Architect highly efficient, scalable pipelines to ingest, decode, and synchronously process thousands of hours of high-resolution egocentric video alongside rich sensor streams (IMUs, force-torque sensors, tactile pads, and joint proprioception).
Force Analysis & Hidden State Inference: Develop sophisticated post-processing algorithms to analyze force interactions and infer unobservable or missing states from raw data. This includes calibrating and cleaning direct force-aware data collections, estimating contact forces from object deformation, tracking occluded objects during complex manipulation, or applying inverse kinematics to fill in missing joint trajectories.
Kinematic Retargeting & Alignment: Develop algorithms to translate 3D human hand tracking, wrist motion, and pose estimation into the specific 6DoF/joint-space coordinates of our humanoid's end-effectors, relying on sensor fusion to ensure absolute precision.
Advanced Data Augmentation: Implement robust data augmentation strategies (spatial transformations, temporal scaling, synthetic viewpoints, and sensor noise injection) to expand expert trajectories and improve the robustness of our learning models.
Teleoperation Synchronization: Work closely with the Hardware Teleoperation Team (UMI & Console operators) to perfectly align human-robot play-data (haptics, force profiles, video, audio, telemetry) with large-scale pre-training datasets.

What We Are Looking For

Education: B.S., M.S., or Ph.D. in Computer Science, Data Engineering, Machine Learning, Robotics, or a related field.
Programming & ML Frameworks: Deep expertise in Python and extensive experience with PyTorch, specifically in handling custom dataloaders for multimodal datasets.
Force & Time-Series Data Processing: Experience analyzing and processing complex time-series data from force-torque (F/T) sensors, load cells, or tactile arrays, ensuring pristine alignment with visual frames.
Video Processing Expertise: Mastery of video processing pipelines and libraries (OpenCV, FFmpeg, Decord) and managing the I/O bottlenecks of terabyte-scale video datasets.
Computer Vision / Pose Estimation: Hands-on experience with 3D hand tracking, human pose estimation (e.g., MediaPipe), and spatial geometry calculations.
Embodied AI Familiarity: Strong understanding of modern imitation learning paradigms, VLA architectures, and frameworks focused on human-to-robot transfer (e.g., EgoScale, EgoMimic, or OpenVLA).
Data Augmentation: Proven ability to implement programmatic and generative data augmentation techniques for computer vision and time-series data.

Bonus Skills

Experience with NVIDIA's robotic software stack (Isaac, Cosmos, or components of the GR00T framework).
Familiarity with distributed data processing systems (Ray, Apache Spark) for cluster computing.
Background in generating or utilizing synthetic robotic data via simulation (Omniverse, MuJoCo).
Experience integrating spatial awareness or tactile data representations (e.g., Fourier encoding) into visual pipelines.

* Ladders Estimates

Similar Jobs

Data Engineer II
$90K — $120K *
Topgolf International, Inc.
Dallas, TX 75217 (Dallas County)
Reposted Today
Snowflake Developer
$60K — $135K *
Wipro
Dallas, TX 75217 (Dallas County)
Today
Data Engineer II
$90K — $130K *
EOG Resources
Houston, TX 77084 (Harris County)
Today
Data Engineer, Platform
$101K — $127K *
DraftKings
Remote
Today
Data Engineer
$100K — $130K *
Fisher Investments
Plano, TX 75025 (Collin County)
Today
Alteryx ETL Consultant
$80K — $110K *
PrimeSource Building Products
Irving, TX 75061 (Dallas County)
Today

Get Ready For Your
Next Interview

More Jobs at Persona AI

Robotics Data Pipeline Engineer - Multimodal Data
$90K — $130K *
Houston, TX 77084 (Harris County)
Today
Information Technology
In-Person
Robotics Data Pipeline Engineer - Multimodal Data
$90K — $120K *
Pensacola, FL 32514 (Escambia County)
Today
Consumer Technology
In-Person
Commercial Operations & Strategy Lead
$120K — $150K *
Houston, TX 77084 (Harris County)
Yesterday
Business Services
In-Person
Reinforcement Learning Engineer, Grasping
$90K — $130K *
Houston, TX 77084 (Harris County)
1 week ago
Consumer Technology
In-Person
Reinforcement Learning Engineer, Grasping
$90K — $130K *
Houston, MN 55943 (Houston County)
1 week ago
Consumer Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
End User Services - Audit and Controls
$90K — $120K *
Royal Bank of Canada
Toronto, ON M3C 0E3
Today
Software Engineer
$180K — $250K *
C1
San Francisco, CA 94112 (San Francisco County)
Today
CPIC Project Manager
$90K — $120K *
Evoke Research and Consulting, LLC
Fort Washington, MD 20744 (Prince Georges County)
Today
Architecte en sécurité cloud / Cloud Security Architect, Services Professionnels AWS / AWS professional services
$99K — $166K *
Amazon
Montreal, QC H1A 0A1
Today

Find similar Robotics Data Pipeline Engineer - Multimodal Data jobs:

Nationwide Houston, TX

Robotics Data Pipeline Engineer - Multimodal Data

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Robotics Data Pipeline Engineer - Multimodal Data jobs:

Get Ready For Your
Next Interview