As a Senior Data Engineer on our India engineering team, you will own the design and delivery of scalable, reliable data pipelines that process real-world supply chain data at scale. You will work closely with Data Science, DevOps, and Customer Success teams to deploy and evolve our core platform - and your hands-on experience with production data will directly shape product decisions.
This is a high-ownership role. You will be expected to solve ambiguous problems, mentor junior engineers, and contribute to the technical direction of the team.
What You'll Do
- Design, build, and maintain production-grade data pipelines handling large-scale supply chain datasets
- Own end-to-end data modeling, warehousing architecture, and pipeline reliability across multiple customer environments
- Collaborate with Data Science, DevOps, Infrastructure, and Customer Success teams to deliver and iterate on product deployments
- Debug complex pipeline failures and performance bottlenecks across distributed systems
- Drive improvements to engineering practices, tooling, and deployment automation
- Mentor and support junior data engineers, fostering technical growth within the team
- Contribute to Daybreak's culture of continuous learning and operational excellence
Must Have
- 5+ years of experience in data engineering, with demonstrated ownership of production pipelines
- Strong proficiency in Python for data engineering workloads
- Advanced SQL skills across multiple databases (PostgreSQL and others)
- Hands-on experience with pipeline orchestration tools such as Apache Airflow or Dagster
- Experience working with distributed data systems - Spark, Hive, or HDFS
- Solid understanding of containers and Docker in a development/deployment context
- Strong debugging skills and systematic approach to resolving data quality issues
- Bachelor's or advanced degree in Computer Science, Engineering, or a related field
- Ability to thrive in a fast-paced, evolving environment and pick up new technologies quickly
Helpful to Have
- Cloud experience, preferably AWS (S3, EMR, or equivalent services)
- Experience processing large-scale time series data
- Familiarity with Kubernetes for containerized workloads
- Understanding of ML model lifecycle and MLOps pipelines
- Experience working in interdisciplinary teams across engineering and data science