About UsSieve is the only AI research lab exclusively focused on video data. We combine exabyte-scale video infrastructure, novel video understanding techniques, and dozens of data sources to develop datasets that push the frontier of video modeling. Video makes up 80% of internet traffic and has become the enabling digital medium powering creativity, communication, gaming, AR/VR, and robotics. Sieve exists to solve the biggest bottleneck in growth of these applications: high-quality training data.
Sieve scaled from 0 to $XXM in revenue in the second half of 2025, with a relatively small team of 12 people. We also recently raised our Series A from Tier 1 firms such as Matrix Partners, Swift Ventures, Y Combinator, and AI Grant.
About the RoleAs a distributed systems engineer at Sieve, you'll design and engineer systems that handle the compute, scheduling, and orchestration of complex ML + ETL pipelines that need to run quickly, reliably, and cost-effectively on large sums of video.
You're likely a good fit if you love optimizing for system uptime, have worked with cloud technologies, optimizing hyper-fast distributed systems at the scale of thousands of GPUs, and building great internal tooling and CI/CD for rapid iteration.
Requirements- 3+ years of experience building foundational data infrastructure
- Proficient in working across diverse cloud architectures
- Designed and maintained pipelines that process petabytes of data
- Developed robust CI/CD pipelines tailored for ML-focused teams
- Strong coding experience with Go and Python
- Operates as an IC who leads by example
- Experience with large-scale video data systems
- In-person at our SF HQ