Research Scientist / Engineer - Efficient Modeling

Rhoda AI

$120K — $160K *
Enterprise Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Strong understanding of model compression and efficient architectures for large models.
  • Hands-on experience with quantization, distillation, or pruning applied to transformers or large neural networks.
  • Deep knowledge of where efficiency gains are possible in modern architectures.
  • Proficiency with PyTorch and familiarity with hardware-aware optimization (CUDA, TensorRT, or similar).
  • Ability to run principled experiments that characterize capability-efficiency tradeoffs.

Responsibilities

  • Research and implement model compression techniques: quantization, pruning, structured sparsity, distillation, and low-rank approximation.
  • Design efficient architectures and attention mechanisms suited to real-time inference on edge and robot hardware.
  • Develop training strategies that produce better accuracy-efficiency tradeoffs from the start.
  • Profile and benchmark models across hardware targets to identify and resolve efficiency bottlenecks.
  • Build evaluation frameworks that measure capability retention after compression or architecture changes.
  • Collaborate with training systems and deployment teams to ensure efficient models translate to faster real-world inference.
  • Publish and present work at top-tier venues.

Benefits

  • Opportunity to influence real-time robot deployments with advanced model capabilities.
  • High impact role that enhances the efficiency of all models the team develops.
  • Unique blend of deep learning research with practical systems implementation.
Full Job Description
We're looking for a Research Scientist or Research Engineer focused on model efficiency - making our foundation world models faster, smaller, and more deployable without sacrificing capability. This work is critical to closing the gap between research-scale models and real-time operation on robot hardware. **What You'll Do** - Research and implement model compression techniques: quantization, pruning, structured sparsity, distillation, and low-rank approximation - Design efficient architectures and attention mechanisms suited to real-time inference on edge and robot hardware - Develop training strategies that produce better accuracy-efficiency tradeoffs from the start - Profile and benchmark models across hardware targets to identify and resolve efficiency bottlenecks - Build evaluation frameworks that measure capability retention after compression or architecture changes - Collaborate with training systems and deployment teams to ensure efficient models translate to faster real-world inference - Publish and present work at top-tier venues **What We're Looking For** - Strong understanding of model compression and efficient architectures for large models - Hands-on experience with quantization, distillation, or pruning applied to transformers or large neural networks - Deep knowledge of where efficiency gains are possible in modern architectures - Proficiency with PyTorch and familiarity with hardware-aware optimization (CUDA, TensorRT, or similar) - Ability to run principled experiments that characterize capability-efficiency tradeoffs **Nice to Have (But Not Required)** - PhD in ML, CS, or a related field - or equivalent research/engineering experience - Publication record at NeurIPS, ICML, ICLR, MLSys, or related venues - Experience with efficient video or multimodal model architectures - Familiarity with edge deployment targets (Jetson, custom ASICs, or mobile hardware) - Prior work on speculative decoding, early exit, or adaptive compute - Experience deploying compressed models on physical robots or latency-constrained systems **Why This Role** - Bridge the gap between large-scale research models and real-time robot deployments - Your work determines whether frontier capabilities actually run on our hardware - High leverage: efficiency improvements benefit every model the team trains and deploys - Work at a rare intersection of deep learning research and systems

Similar Jobs

More Jobs at Rhoda AI

More Enterprise Technology Jobs

Find similar Research Scientist / Engineer - Efficient Modeling jobs: