Software Engineer, Data Infrastructure

Lyft   •  

San Francisco, CA

Industry: Transportation Services


Not Specified years

Posted 428 days ago

This job is no longer available.

You will be a part of an early team that builds the data transport, collection and storage, and exposes services that make data a first-class citizen at Lyft. You will help shape the vision and architecture of Lyft’s next generation data infrastructure, making it easy for developers to build data-driven products and features connecting millions of our drivers and passengers. You will be responsible for developing a reliable infrastructure that scales with the company’s incredible growth. Your efforts will allow accessibility to business and user behavior insights, leveraging huge amounts of Lyft data to fuel several teams such as Analytics, Data Science, Marketplace, Fraud and many others. 
We are a set of engineers constantly striving to create an amazing experience for our customers and ourselves, and we believe data brings everything together. We build and operate the platform used by the rest of the company for stream and batch computation serving mechanisms to train ML models. You will be a part of an experienced engineering team and work with passionate leaders on challenging distributed systems problems. We regard culture and trust highly and believe you will add positively to it in your own way.


  • Experience building and operating large scale data infrastructure in production (performance, reliability, monitoring)
  • Deep understanding of distributed systems concepts and principles (consistency and availability, liveness and safety, durability, reliability, fault-tolerance, consensus algorithms)
  • Experience bringing open source software to production at scale (Yarn, HDFS, Hive, Spark, Presto, ZooKeeper, Airflow)
  • Experience designing, implementing and debugging distributed systems that run across thousands of nodes
  • Hands on experience with Hadoop (or similar) ecosystem - Yarn, Hive, HDFS, Spark, Presto, Parquet, HBase
  • Experience working with realtime compute and streaming infrastructure - Kafka, Kinesis, Flink, Storm, Beam
  • Experience configuring, identifying performance bottlenecks and tuning MPP databases
  • Able to think through long-term impacts of key design decisions and handling failure scenarios
  • Experience with workflow management (Airflow, Oozie, Azkaban, UC4)


  • Service oriented mindset
  • MPP database expertise (Redshift, Vertica, Teradata)
  • Experience owning mission critical service(s)
  • Experience planning capacity for large scale production systems
  • Experience working with pub-sub messaging systems (Kafka, Kinesis etc)
  • Experience contributing to or developing stream compute frameworks (Apache Flink, Storm, Samza, Heron, Beam etc)