About the roleThis is an opportunity to shape the future of Zillow's real-time data platform and the experiences we deliver to customers. As a Senior Software Engineer, Big Data on the Streaming team, you will design and evolve the streaming infrastructure that hundreds of internal services rely on every day.
You will independently lead the design and evolution of distributed streaming systems, control planes, and developer tooling that serve hundreds of internal services, with a focus on scalability, reliability, and long-term platform sustainability.
You Will Get To- Design, build, and operate large-scale Kafka and Flink infrastructure supporting tier-0 and tier-1 workloads.
- Lead critical initiatives in our streaming platform modernization, including platform architecture evolution.
- Develop and enhance streaming control planes, APIs, CLIs, and provisioning systems that standardize how teams create and operate streaming resources across Zillow.
- Improve platform reliability through SLO definition, monitoring, alerting, incident response, and automation.
- Enable simplified stream processing patterns for product and engineering teams, reducing the need for bespoke infrastructure or specialized expertise.
- Evaluate and integrate modern streaming ecosystem capabilities, including managed Kafka offerings, serverless stream processing, and real-time AI integration patterns.
- Make high-quality architectural decisions under ambiguity, balancing reliability, cost, performance, and developer experience across competing priorities.
- Mentor engineers and contribute to raising the bar on distributed systems design, operational excellence, and long-term platform strategy.
This role has been categorized as a Remote position. "Remote" employees do not have a permanent corporate office workplace and, instead, work from a physical location of their choice, which must be identified to the Company. U.S. employees may live in any of the 50 United States, with limited exceptions.
In California, Connecticut, Maryland, Massachusetts, New Jersey, New York, Washington state, and Washington DC the standard base pay range for this role is $160,900.00 - $257,100.00 annually. This base pay range is specific to these locations and may not be applicable to other locations.In Colorado, Hawaii, Illinois, Minnesota, Nevada, Ohio, Rhode Island, and Vermont the standard base pay range for this role is $152,900.00 - $244,300.00 annually. The base pay range is specific to these locations and may not be applicable to other locations.
In addition to a competitive base salary this position is also eligible for equity awards based on factors such as experience, performance and location. Actual amounts will vary depending on experience, performance and location. Employees in this role will not be paid below the salary threshold for exempt employees in the state where they reside.
Who you are- 5+ years of experience building and operating large-scale distributed systems, including independently owning critical production systems end to end.
- Significant production experience with Kafka and/or Flink, including performance tuning, state management, scaling strategies, and operational incident resolution.
- Proficiency in at least one programming language such as Python, Java, or Scala.
- Experience operating services in cloud environments (for example, AWS) and working with container orchestration platforms like Kubernetes.
- Experience designing scalable, multi-tenant systems with reliability, cost efficiency, and observability in mind.
- Experience defining and operating against SLOs, participating in on-call rotations, and leading incident response efforts.
- Familiarity with infrastructure-as-code tooling such as Terraform and CI/CD systems.
- Strong systems design skills, including the ability to reason about consistency, state management, fault tolerance, and throughput.
- Experience collaborating across platform and product teams to define boundaries, contracts, and integration patterns.
- Here at Zillow, we value the experience and perspective of candidates with non-traditional backgrounds. We encourage you to apply if you have transferable skills or related experiences.
Preferred Qualifications- Experience working with streaming vendors (for example, Confluent, MSK, Redpanda) or modernizing legacy Kafka/Flink infrastructure.
- Demonstrated experience leading system design efforts for complex, multi-team platform initiatives.
- Experience integrating streaming systems with analytics platforms such as Databricks or building real-time context engineering capabilities for AI systems.
- Background in reliability engineering or platform engineering.