Key job responsibilities
In this role you will build and maintain the data infrastructure that powers our robotics manipulation research. You'll work alongside our existing team of platform engineers to extend the systems that turn raw robot session data into curated, trainable episodes. This team owns streaming ingestion pipelines, platform and schema design, heterogeneous data sources, data curation and quality controls, full-stack inspection and dataset-builders that researchers and human annotators actually use, and tools to let scientists go from dataset to training job without leaving the platform. We run on a modern cloud-native stack - distributed compute on Kubernetes, streaming data infrastructure, columnar lakehouse storage, and a TypeScript/React frontend. We're looking for engineers willing and eager to work on the full stack in a fast iteration cycle while working with researchers as close customers.
What matters is that you can ship full-stack data infrastructure real users depend on, treat researchers as collaborators rather than customers, and have a strong bias toward iteration in a flat org where engineers pick up science-driven work directly instead of waiting for approval layers.
BASIC QUALIFICATIONS
- 5+ years of non-internship professional software development experience
- 5+ years of owning the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Strong software engineering background with full-stack development experience
- Expertise in distributed systems, cloud computing, and scalable data processing
- Experience with data pipeline design, ETL processes, and data management systems
PREFERRED QUALIFICATIONS
- Experience as a mentor, tech lead or leading an engineering team
- 5+ years of programming with at least one software programming language experience
- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Bachelor's degree in computer science or equivalent
- Experience with dataset curation and quality assessment techniques Knowledge of computer vision and multimodal data processing
- Deep understanding of machine learning fundamentals, particularly large-scale model training
- Background in research environments or supporting ML research workflows
- Experience with data visualization and annotation tooling
- Familiarity with modern data filtering and deduplication methodologies
- Proficiency in translating academic concepts into production systems
Our compensation reflects the cost of labor across several U.S. geographic markets. The base pay for this position ranges from $150,000/year in our lowest geographic market up to $300,000/year in our highest geographic market. Pay is based on a number of factors, including market location, and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. Applicants should apply via our internal or external career site.