Ro

Senior Applied AI Scientist

Ro$182K — $220K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience in data science, applied machine learning, or related field, with focus on LLMs or AI evaluation.
  • Strong Python and SQL skills with production data pipeline experience.
  • Experience designing reproducible evaluation frameworks, not just manual checks.
  • Strong statistical intuition and understanding of distributions and confidence intervals.
  • Comfortable collaborating with engineering and product teams on production improvements.
  • Bonus: Familiarity with evaluation platforms, experimentation platforms, causal inference, and healthcare contexts.

Responsibilities

  • Design and own evaluation frameworks for production LLM features.
  • Analyze production behavior to identify quality issues and other bottlenecks.
  • Run experiments involving prompt variations, workflow changes, and model comparisons.
  • Define and build dashboards to make AI performance metrics visible.
  • Collaborate with engineering to implement optimizations and measure success.
  • Mentor teammates in experimental design and evaluation methodologies.

Benefits

  • Flexible work environment with opportunity for remote work.
  • Access to cutting-edge LLMs and AI technologies.
  • Collaborative culture that emphasizes mentorship and professional growth.
  • Participation in the development of impactful healthcare products.
  • Opportunity to shape a new function within the organization.
Full Job Description
The Role

Ro is building a team focused on shipping LLM-powered products across the patient experience, clinical operations, and internal tooling.

We're hiring a Senior Applied AI Scientist to own the evaluation, measurement, and optimization of our AI systems. This role sits at the intersection of data science, applied machine learning, and product engineering. You'll design the frameworks that tell us whether our AI systems are actually working and use those insights to continuously improve them.

This is not a research role. You'll work closely with engineers and product teams to evaluate production systems, run experiments, identify failure modes, and ensure our AI products become more accurate, reliable, and cost-effective over time.

What You'll Do

  • Design and own evaluation frameworks for production LLM features, including LLM-as-a-judge evaluations, regression suites, synthetic datasets, golden datasets, and human review workflows.
  • Analyze production behavior to identify quality issues, hallucinations, latency bottlenecks, cost regressions, and emerging failure modes.
  • Design and run experiments including prompt variations, workflow changes, retrieval improvements, and model comparisons; and quantify their impact on quality, operational metrics, and user outcomes.
  • Define the metrics that matter and build dashboards that make AI performance visible across the organization.
  • Partner with engineering to determine which optimizations should be productionized and how to measure ongoing success.
  • Mentor teammates on experimental design, statistical rigor, evaluation methodology, and measurement best practices.


Who You Are

  • 5+ years of experience in data science, applied machine learning, experimentation, or a closely related field, with at least the last year focused on applied LLMs or AI evaluation.
  • Strong Python and SQL skills with experience working on production data pipelines and experimentation.
  • You have experience designing reproducible evaluation frameworks rather than relying on manual spot checks or qualitative assessments.
  • You have strong statistical intuition: you think in terms of distributions, confidence intervals, variance, and sample sizes rather than anecdotes.
  • You're comfortable working closely with engineers and product teams to translate experimental findings into production improvements
  • Bonus: Experience with evaluation platforms (e.g. Braintrust, LangSmith, OpenAI Evals), experimentation platforms, causal inference, healthcare, or operations-heavy environments.


A note on reporting structure

This is a new function at Ro, and we're being deliberate about not over-defining it. Your manager and where you sit on the org chart will depend on the specific shape of the team we end up with. We'd rather find the right people and figure out the lines around them than pre-draw boxes and miss great candidates. If that ambiguity is a deal-breaker, this isn't the right role; if it sounds like an opportunity, we want to talk.

The target base salary for this position ranges from $182,300 to $220,000, in addition to a competitive equity and benefits package (as applicable). When determining compensation, we analyze and carefully consider several factors, including location, job-related knowledge, skills and experience. These considerations may cause your compensation to vary.

About Ro

Ro is an American telehealth company that diagnoses patients, and subsequently prescribes and delivers treatments. The company is headquartered in New York City.
Learn more about Ro
Industry
Founded
2017

Similar Jobs

More Jobs at Ro

More Information Technology Jobs

Find similar Senior Applied AI Scientist jobs: