Data Engineer, Machine Learning

Sesame

• $120K — $160K *

San Francisco, CA 94112In-Person

Information Technology

5 - 7 years of experience

Reposted Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of experience in data engineering supporting ML or AI teams
Proficient in SQL and Python
Skilled in buildingETL/ELT pipelines with modern tools
Experience with workflow orchestration tools like Airflow or Prefect
Hands-on knowledge of ML data workflows
Understanding of ML team dynamics and data quality impact
Ability to work with unstructured and semi-structured data

Responsibilities

Design and build production data pipelines for model training
Collaborate with ML engineers to meet data needs for models
Maintain infrastructure for dataset versioning and tracking
Develop frameworks to ensure data quality and monitoring
Optimize data processing for performance and cost efficiency
Create tools for ML teams to independently explore and request data
Establish data governance standards for sensitive information
Contribute to architecture decisions for the evolving data platform

Benefits

401(k) with 3.5% employer match
100% employer-paid health, vision, and dental coverage
Unlimited PTO and sick time
Flexible spending account with matching up to $1,650/year
Guardian Employee Assistance Program (EAP)
Competitive stock options for employees

Full Job Description

About the Role

We're looking for a Data Engineer to build and maintain the data pipelines that feed Sesame's AI models. You'll collaborate directly with machine learning engineers and researchers - your job is to make sure they have the right data, in the right shape, at the right time to train, evaluate, and ship models.

Sesame's data is rich and complex: conversations, voice, sensor signals, and product telemetry. You'll design the systems that take raw, unstructured, multimodal data and turn it into clean, versioned, well-documented datasets that ML teams can trust and build on confidently.

This is a deeply technical, infrastructure-focused role - closer to ML engineering than traditional data analytics. You'll be deeply embedded with ML teams, understanding their workflows and building infrastructure that accelerates the full model development lifecycle - from data collection and labeling through training and evaluation.

Responsibilities

Design and build production data pipelines that prepare conversational, voice, and multimodal data for model training and evaluation.
Partner directly with ML engineers to understand data requirements for new models and experiments, and deliver datasets that meet those needs.
Build and maintain infrastructure for dataset versioning, lineage tracking, and reproducibility - so any training run can be traced back to its exact data.
Develop data quality frameworks that catch issues before they become model quality issues: schema validation, drift detection, and coverage monitoring.
Optimise large-scale data processing for cost and performance across Sesame's cloud infrastructure.
Build tooling that makes it easy for ML engineers and researchers to discover, explore, and request data independently.
Define and enforce data governance and privacy standards, particularly around sensitive conversational and voice data.
Contribute to architecture decisions around Sesame's broader data platform as the team and data volume grow.

Required Qualifications:

5+ years in data engineering, with meaningful experience supporting ML or AI teams specifically.
Strong SQL and Python skills - you'll use both daily.
Experience building and operating ETL/ELT pipelines at scale using modern data platforms and tooling.
Experience with workflow orchestration systems such as Airflow, Dagster, or Prefect.
Hands-on experience with ML data workflows: training data pipelines, dataset versioning, data labeling pipelines, or model evaluation data.
A solid understanding of how ML teams work - you don't need to train models; what matters is understanding what makes a good training dataset and why data quality directly affects model performance.
Comfort working with unstructured and semi-structured data - audio, text, JSON logs - not just clean relational tables.
Strong communication skills. You'll be embedded with ML engineers and need to bridge data systems and model requirements effectively.

Preferred Qualifications:

Vector databases, embedding storage, or feature stores.
Data from hardware or embedded systems: telemetry, sensors, real-time streams.
Distributed compute frameworks for large-scale data processing such as Ray or Spark.
Kubernetes and managed Kubernetes environments such as GKE or EKS.
Data privacy frameworks, especially around voice or conversational data.
Building internal tooling or self-serve data platforms.

Full-time Employee Benefits:

401 (k) max employer match: 3.5% of compensation
100% employer-paid health, vision, and dental benefits for you and your dependents
Unlimited PTO and sick time
Flexible spending account with employer matching up to $1,650/year (medical FSA)
Guardian Employee Assistance Program (EAP)
Opportunity to share in the company's success with competitive stock options

Benefits do not apply to contingent/contract workers.

* Ladders Estimates

Similar Jobs

Data Engineer
$120K — $150K *
Robot.com
San Francisco, CA 94112 (San Francisco County)
Reposted Today
Palantir Developer
$85K — $141K *
Guidehouse
Remote
Today
SQL Swerver Developer
$100K — $130K *
Polar IT Services
Remote
Reposted Today
Data Engineer - Archimedes
$90K — $130K *
Navitus Health Solutions, LLC
Remote
Today
Engineer, Data Integration - Archimedes
$90K — $130K *
Navitus Health Solutions, LLC
Remote
Today
Software Engineer, Data Infrastructure & Acquisition - San Mateo, CA, USA
$140K — $200K *
Speechify
San Mateo, CA 94403 (San Mateo County)
Reposted Yesterday

Get Ready For Your
Next Interview

More Jobs at Sesame

Data Engineer, Machine Learning
$120K — $160K *
San Francisco, CA 94112 (San Francisco County)
Reposted Today
Information Technology
In-Person
Global Supply Chain Manager, FATP
$120K — $150K *
San Francisco, CA 94112 (San Francisco County)
5 days ago
Manufacturing & Automotive
In-Person
Global Supply Chain Manager, FATP
$100K — $130K *
Bellevue, WA 98006 (King County)
5 days ago
Manufacturing & Automotive
In-Person
Customer Experience Manager
$100K — $130K *
San Francisco, CA 94112 (San Francisco County)
1 month ago
Consumer Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
Today
Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Director, Infrastructure & Operations
$144K — $190K *
Resonetics
Remote
Today
Senior Manager, AI, Automation & Innovation
$127K — $185K *
Reynolds Consumer Products
Lake Forest, IL 60045 (Lake County)
Today
Director Cybersecurity - AI/ML/Automation (Cyber Threat Analytics)
$188K — $282K *
AT&T
Charlotte, NC 28269 (Mecklenburg County)
Today

Find similar Data Engineer, Machine Learning jobs:

Nationwide San Francisco, CA

Data Engineer, Machine Learning

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Data Engineer, Machine Learning jobs:

Get Ready For Your
Next Interview