Multimodal LLM Researcher

DEEPREC.AI

$300K — $400K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Strong research track record in multimodal AI or foundation models
  • First-author publications at recognized ML, vision, or audio conferences
  • Deep expertise in LLMs, VLMs, Audio LMs, or related fields
  • Strong Python and experience with modern deep learning frameworks
  • Desirable experience with diffusion models and real-time AI systems

Responsibilities

  • Lead research on LLMs, VLMs, and Audio Language Models
  • Design innovative multimodal model architectures and training methods
  • Enhance real-time inference across multiple modalities
  • Train and fine-tune autoregressive and diffusion models
  • Build and curate high-quality multimodal datasets
  • Collaborate with engineering teams for deployment of research outcomes
  • Publish findings at leading AI conferences and journals

Benefits

  • Fully remote working arrangement
  • Ownership of research that influences production systems
  • Opportunity to publish and contribute to the field
  • Direct collaboration with product and engineering leadership
Full Job Description
Multimodal LLM Researcher
$300,000 - $400,000
Remote, Palo Alto
Full-time / Permanent

You'll help define the next generation of multimodal AI systems. Your work will span research, experimentation, and deployment, with a focus on real-time performance, multimodal reasoning, and agent-based workflows. You'll have the freedom to explore ambitious ideas while working alongside engineers who can bring them into production.

What You'll Do
- Lead research across LLMs, VLMs, and Audio Language Models
- Design novel multimodal model architectures and training approaches
- Improve real-time inference across text, image, audio, and video
- Train and fine-tune autoregressive and diffusion models
- Build and curate high-quality multimodal datasets
- Collaborate with engineering teams to deploy research outcomes
- Publish findings at leading AI conferences and journals

What You'll Bring
Essential
- Strong research track record in multimodal AI or foundation models
- First-author publications at recognised ML, vision, or audio conferences
- Deep expertise in LLMs, VLMs, Audio LMs, or related fields
- Strong Python and deep learning experience using modern frameworks

Desirable
- Experience with diffusion models or world models
- Background in real-time AI systems and model serving
- Experience building large-scale multimodal datasets

We encourage you to apply even if you don't meet every requirement. The right mindset matters as much as the right CV.

What's In It For You

- USD 300,000-400,000 salary
- Fully remote working arrangement
- Ownership of research that shapes production systems
- Opportunity to publish and contribute to the field
- Direct collaboration with product and engineering leadership

This role offers the chance to work on multimodal AI problems that sit at the intersection of research and real-world deployment. If you're excited by advancing the field while seeing your work reach users, we'd love to hear from you.

Similar Jobs

More Jobs at DEEPREC.AI

More Information Technology Jobs

Find similar Multimodal LLM Researcher jobs: