Multimodal LLM Researcher
$300,000 - $400,000
Remote, Palo Alto
Full-time / Permanent
You'll help define the next generation of multimodal AI systems. Your work will span research, experimentation, and deployment, with a focus on real-time performance, multimodal reasoning, and agent-based workflows. You'll have the freedom to explore ambitious ideas while working alongside engineers who can bring them into production.
What You'll Do
- Lead research across LLMs, VLMs, and Audio Language Models
- Design novel multimodal model architectures and training approaches
- Improve real-time inference across text, image, audio, and video
- Train and fine-tune autoregressive and diffusion models
- Build and curate high-quality multimodal datasets
- Collaborate with engineering teams to deploy research outcomes
- Publish findings at leading AI conferences and journals
What You'll Bring
Essential
- Strong research track record in multimodal AI or foundation models
- First-author publications at recognised ML, vision, or audio conferences
- Deep expertise in LLMs, VLMs, Audio LMs, or related fields
- Strong Python and deep learning experience using modern frameworks
Desirable
- Experience with diffusion models or world models
- Background in real-time AI systems and model serving
- Experience building large-scale multimodal datasets
We encourage you to apply even if you don't meet every requirement. The right mindset matters as much as the right CV.
What's In It For You
- USD 300,000-400,000 salary
- Fully remote working arrangement
- Ownership of research that shapes production systems
- Opportunity to publish and contribute to the field
- Direct collaboration with product and engineering leadership
This role offers the chance to work on multimodal AI problems that sit at the intersection of research and real-world deployment. If you're excited by advancing the field while seeing your work reach users, we'd love to hear from you.