The RoleWe are looking for an engineer to work at the intersection of research and systems on our pretraining stack. You'll contribute across the full pipeline, from data processing and model architecture to distributed training infrastructure and low-level optimization, and help determine how we scale our next generation of models. Some example areas you might work on (not limited to):
- Implement and iterate on model architectures, training algorithms, and optimizer research in large-scale pretraining runs
- Scale distributed training jobs across thousands of GPUs
- Optimize training throughput for novel attention mechanisms, architecture variants, and compute efficiency improvements
- Design and build large-scale data pipelines for efficient model consumption and dataset curation
- Run and analyze scientific experiments to advance understanding of how architecture and data choices affect model capabilities
If you're excited about working across research and engineering to push the frontier of what large models can do, we'd love to hear from you.
We offer a base salary of $350,000-$500,000 USD and a meaningful equity grant, depending on experience and background, along with competitive benefits.