Role OverviewLabelbox is the RL data factory for advancing frontier agent capabilities. We build the data, evaluations, and infrastructure that frontier labs use to train and judge their agents. We're looking for talented, experienced engineers to join us. The bar is high: engineers who have strong judgment and set technical direction, quickly build prototypes that scale into the reliable systems, and are at the frontier of agent-first engineering practices and innovating to accelerate the speed of the business.
What you may work on- Eval systems that run millions of agent trajectories to measure model and product quality.
- Fine-tuning pipelines that turn evaluation signals into measurable agent improvements.
- Agent-first product surfaces: UX and infrastructure for workflows where the user is a model or an agent operator.
- The systems behind hundreds of thousands of AI interviews used to source and match freelance workers to projects.
- Infrastructure that scales to the throughput frontier labs actually need.
- Integration of the latest models and capabilities into production within days of release.
What we're looking for- 4+ year track record of shipping systems customers and other engineers rely on
- You build full stack prototypes fast and they hold up. The v1 you ship becomes the foundation the rest of the team builds on.
- Strong system and API design judgement
- Hard architecture and product calls land with you. You make them, defend them under pressure, and update fast when someone else is right.
- You ship production code with coding agents daily. You know where they break and what it takes to make them reliable to further accelerate the team's velocity.
- You set direction by being the example. Other engineers reach for your designs and your code as the reference.
- You move fast in ambiguous, startup-pace environments with influence over authority.
- You have worked in all parts of the stack
- Deep proficiency in TypeScript and/or Python.
Nice to have- Production experience building LLM- or agent-driven products.
- Designing evaluations for LLMs and agents, or producing high-quality data for ML systems.
- Background in production distributed systems, ML infrastructure, or data systems at scale.
Our Technology StackOur engineering team works with a modern tech stack designed for scalability, performance, and developer efficiency:
- Frontend: React.js with Redux, TypeScript
- Backend: Node.js, TypeScript, Python, some Java & Kotlin
- APIs: GraphQL
- Cloud & Infrastructure: Google Cloud Platform (GCP), Kubernetes
- Databases: MySQL, Spanner, PostgreSQL
- Queueing / Streaming: Kafka, PubSub
Labelbox strives to ensure pay parity across the organization and discuss compensation transparently. The expected annual base salary range for United States-based candidates is below. This range is not inclusive of any potential equity packages or additional benefits. Exact compensation varies based on a variety of factors, including skills and competencies, experience, and geographical location.
Annual base salary range
$250,000-$280,000 USD
Life at Labelbox- Location: Join our dedicated tech hubs in San Francisco or Wrocław, Poland
- Work Style: Hybrid model with 2 days per week in office, combining collaboration and flexibility
- Environment: Fast-paced and high-intensity, perfect for ambitious individuals who thrive on ownership and quick decision-making
- Growth: Career advancement opportunities directly tied to your impact
- Vision: Be part of building the foundation for humanity's most transformative technology
Our VisionWe believe data will remain crucial in achieving artificial general intelligence. As AI models become more sophisticated, the need for high-quality, specialized training data will only grow. Join us in developing new products and services that enable the next generation of AI breakthroughs.
Labelbox is backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures, Databricks Ventures, and Kleiner Perkins. Our customers include Fortune 500 enterprises and leading AI labs.