Yelp

Site Reliability Engineer, Core Streaming (Remote - United States)

Yelp$141K — $216K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of hands-on experience with Kafka event streaming in production environments.
  • Deep understanding of event streaming principles and architecture.
  • Proficient in Java, Python, or similar languages for automation and integration.
  • Familiar with Kafka Client APIs and capacity planning for high-throughput clusters.
  • Experience with real-time data technologies like Apache Flink.
  • Skilled in automating infrastructure and operational tasks via IaC or scripting.
  • Bachelor's degree or relevant work experience.

Responsibilities

  • Design, deploy, and maintain Kafka infrastructure across hybrid and multi-cloud environments.
  • Collaborate with engineers to enhance data pipeline reliability and introduce new features.
  • Automate Kafka cluster upgrades and migrations while minimizing service impact.
  • Enhance self-service capabilities for cluster operations and incident recovery.
  • Troubleshoot and resolve complex data flow and performance issues.
  • Participate in on-call rotations within a distributed SRE team.

Benefits

  • Fully remote opportunity available for US applicants.
  • Access to Yelp’s well-regarded benefits package.
Full Job Description
Summary

Do you want to help build and operate scalable, resilient systems that power Yelp's critical business functions? Our Site Reliability Engineers (SREs) ensure our services remain fast, reliable, and available, even as we grow and requirements evolve. As an SRE specializing in Kafka, you'll play a pivotal role in managing our real-time data streaming infrastructure and supporting event-driven applications at scale.

We work at the intersection of software development and distributed systems, owning the backbone of our organization's streaming architecture. As a Kafka SRE, you'll take on challenges only found at the kind of scale that supports global, always-on applications. Yelp processes massive amounts of user data daily-over 300 million business reviews, 100,000 photo uploads, and countless check-ins. Maintaining sub-minute data freshness with such high volume presents an exciting technical problem and a very interesting area to work in.

You'll drive best practices in automation and self-service, knowing that deploying or upgrading data streaming infrastructure should be as effortless as a git commit and code review away.

This opportunity is fully remote and does not require you to be located in any particular state within the US. We welcome applicants from throughout the US. We'd love to have you apply, even if you don't feel you meet every single requirement in this posting. At Yelp, we're looking for great people, not just those who simply check off all the boxes.

What you'll do:

  • Design, deploy, and maintain large-scale Kafka event streaming infrastructure across hybrid and multi-cloud environments.
  • Collaborate with engineers to enable new features, ensure data pipeline reliability, and advise on best practices for real-time data processing.
  • Execute and automate Kafka cluster upgrades, migrations, and major version rollouts with minimal impact to critical services.
  • Build or enhance self-service capabilities and automation for cluster operations, scaling, and incident recovery.
  • Troubleshoot complex issues affecting data flow, performance, or stability, and drive root cause analyses.
  • Participate in on-call rotations. Our geographically distributed SRE teams use a "follow-the-sun" model, so no one needs to be on-call 24 hours a day!


What it takes to succeed:

  • Strong hands-on experience designing and implementing large-scale Kafka event streaming capabilities in production, across hybrid or multi-cloud and Linux environments, including upgrades and migrations between platforms or versions.
  • In-depth knowledge of event streaming/data-in-motion design principles, architecture, and operational nuances.
  • Programming proficiency in Java, Python, or similar modern languages for tooling, integration, and automation.
  • Familiarity with Kafka Client APIs (Producer, Consumer, Streams), as well as sizing and capacity planning for high-throughput clusters.
  • Experience designing and optimizing real-time data streaming solutions with technologies like Apache Flink.
  • Knowledge of automating infrastructure and operational tasks (configuration management, IaC, scripting, or related).
  • Problem-solving mindset with an eagerness to learn, take initiative, and advocate for infrastructure best practices in a fast-paced environment.
  • A Bachelor's Degree or an equivalent work experience is required.


What you'll get:

  • There are a variety of factors that go into determining a compensation range, including but not limited to external market benchmark data and years of experience. Based on the anticipated level of experience that we are seeking, we expect the compensation range for this role to be between $141,000 and $216,000 The actual compensation offered may be influenced by a variety of factors, including the candidate's experience and skill set.
    There may be flexibility with the range included in this posting should a candidate be leveled higher or lower than the posted range.
  • This opportunity has the option to be fully remote in all locations across the US.
  • You can find more information about Yelp's five star benefits here!


#LI-Remote

Recruiting and Applicant Privacy Notice

About Yelp

Yelp Inc. is a platform that connects people with local businesses. Yelp was founded in San Francisco in July 2004. Since then, Yelp communities have taken root in major metros across 32 countries. By the end of Q2 2021, Yelpers had written approximately 250 million rich, local reviews, making Yelp the leading local guide for real word-of-mouth on everything from boutiques and mechanics to restaurants and dentists. Approximately 40 million unique devices* accessed Yelp via the Yelp app, approximately 83 million unique visitors visited Yelp via mobile web** and approximately 83 million unique visitors visited Yelp via desktop*** on a monthly average basis during Q2 2021.
Learn more about Yelp
Size
4,400 employees
Market Cap
$1.8 billion
Industry
Net Income
-$19.4 million
Founded
2004
5 Year Trend
+7.6%
Revenue
$872.9 million
NASDAQ

Similar Jobs

More Jobs at Yelp

More Information Technology Jobs

Find similar Site Reliability Engineer, Core Streaming (Remote - United States) jobs: