Full Job Description
We're looking for a Research Scientist or Research Engineer focused on model efficiency - making our foundation world models faster, smaller, and more deployable without sacrificing capability. This work is critical to closing the gap between research-scale models and real-time operation on robot hardware.
**What You'll Do**
- Research and implement model compression techniques: quantization, pruning, structured sparsity, distillation, and low-rank approximation
- Design efficient architectures and attention mechanisms suited to real-time inference on edge and robot hardware
- Develop training strategies that produce better accuracy-efficiency tradeoffs from the start
- Profile and benchmark models across hardware targets to identify and resolve efficiency bottlenecks
- Build evaluation frameworks that measure capability retention after compression or architecture changes
- Collaborate with training systems and deployment teams to ensure efficient models translate to faster real-world inference
- Publish and present work at top-tier venues
**What We're Looking For**
- Strong understanding of model compression and efficient architectures for large models
- Hands-on experience with quantization, distillation, or pruning applied to transformers or large neural networks
- Deep knowledge of where efficiency gains are possible in modern architectures
- Proficiency with PyTorch and familiarity with hardware-aware optimization (CUDA, TensorRT, or similar)
- Ability to run principled experiments that characterize capability-efficiency tradeoffs
**Nice to Have (But Not Required)**
- PhD in ML, CS, or a related field - or equivalent research/engineering experience
- Publication record at NeurIPS, ICML, ICLR, MLSys, or related venues
- Experience with efficient video or multimodal model architectures
- Familiarity with edge deployment targets (Jetson, custom ASICs, or mobile hardware)
- Prior work on speculative decoding, early exit, or adaptive compute
- Experience deploying compressed models on physical robots or latency-constrained systems
**Why This Role**
- Bridge the gap between large-scale research models and real-time robot deployments
- Your work determines whether frontier capabilities actually run on our hardware
- High leverage: efficiency improvements benefit every model the team trains and deploys
- Work at a rare intersection of deep learning research and systems