Machine Learning Researcher, Multimodal LLMs

Bland

$180K — $260K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Experience with LLMs, multimodal models, or speech-language systems.
  • Deep understanding of prompting, fine-tuning, and alignment techniques.
  • Familiarity with neural audio codecs and modern multimodal LLM techniques.
  • Ability to quickly iterate from idea to experiment with concrete conclusions.
  • Strong intuition for natural user interactions and user-facing improvements.

Responsibilities

  • Contribute to the development of next-generation multimodal LLM stack.
  • Build industry-leading conversational AI models for Bland's agent.
  • Integrate streaming audio, tool execution, and dynamic context into a coherent system.
  • Take research ideas through to production systems serving millions of calls daily.
  • Maintain focus on optimizing latency, correctness, and real-world behavior.

Benefits

  • Meaningful equity.
  • Full healthcare, dental, and vision coverage.
  • High autonomy in a dynamic work environment.
  • Opportunity to impact real-world applications and user experiences.
Full Job Description
Machine Learning Researcher, Multimodal LLMs

Location: San Francisco, CA or Remotes

The Role

We are looking for someone to contribute to the development of our next-generation multimodal LLM stack, combining speech, text, tools, and real-time reasoning into a single unified system. You'll be responsible for building industry-leading conversational AI models that power Bland's agent, and taking them all the way from idea to production.

At Bland, we're not just thinking about text modeling. You will define how our agents listen, think, and act in real time, integrating streaming audio, tool execution, and dynamic context into a single coherent system. You will take ideas from research through production systems serving millions of calls per day.

What Makes You a Great Fit

Strong LLM / Multimodal Background
  • Experience with LLMs, multimodal models, or speech-language systems
  • Deep understanding of prompting, fine-tuning, and alignment techniques
  • Familiarity with neural audio codecs and modern multimodal LLM techniques


Fast Experimental Loop
  • You can go from idea 1 dataset 1 experiment 1 conclusion in days
  • You know how to design experiments that actually answer the question


Product Intuition
  • Strong sense for what makes an interaction feel natural vs robotic
  • Ability to translate abstract modeling ideas into user-facing improvements


Builder Mentality
  • You take ownership from research through deployment
  • You thrive in ambiguous, fast-moving environments
  • You care about impact, not just elegance


How You Show Up
  • You think in systems, not just models
  • You obsess over latency, correctness, and real-world behavior
  • You are comfortable discarding ideas quickly when data disagrees
  • You push toward simple abstractions for complex problems


Bonus Points
  • Experience with real-time voice systems or conversational AI
  • Background in tool-using agents or agent frameworks
  • Experience with multimodal datasets (audio + text + actions)
  • Contributions to LLM or speech-related research or open source


Compensation & Benefits
  • Competitive salary: $180,000 - $260,000
  • Meaningful equity
  • Full healthcare, dental, vision
  • Office in Jackson Square, SF
  • High autonomy, high impact

Similar Jobs

More Jobs at Bland

More Information Technology Jobs

Find similar Machine Learning Researcher, Multimodal LLMs jobs: