We are seeking a Data Engineer who is driven and a great team player to work in our Enterprise Data Systems team. Ideal candidate will be an expertise in Data Modelling, Data Marting, ETL, Performance tuning, Data Governance and Data Security leveraging Big Data Technologies, Columnar or Time Series data stores along with traditional RDBMS
- Dimension Modelling & ETL development using Python and Spark structured streaming
- Define Data Security protocols and enable Access controls
- Conduct Database performance tuning and implement low latency data systems
- Experience building Master Data Management software in an organization.
- Build highly scalable data marts that can be used by DSC globally
- Responsible for maintaining data integrity across multiple data marts
- Subscribing to published Kafka streaming data and work on Avro, Json and Parquet data formats
- Data Mapping from sources to the data marts and work with peer data engineering teams to pipeline the data
- Design and code for highly scalable solutions for Data Extractions from Data Lake and transformations jobs for business rules application
- Define and parallel process the ETL jobs towards Low Latency and highly scalable systems
- Code for Data Quality frameworks that can measure and maintain Data Completeness, Data Integrity and Data Validity between interfacing systems
- Documenting Data Mapping and maintain a data dictionary across all DSC enterprise data
- Owning the KPIs to measure the performance of Data Marts and provide visibility to senior management
- Design for Self-Serve BI platforms and drive higher adoption rates
- Master's Degree in Computer Science or related Majors
- 4-6 years of industry experience overall
- 4+ years of Data Warehousing and Data Engineering with
- 4 + years of Data Modelling and Data Processing for large scale Near Real time Big Data platforms like Redshift, HBase, Druid, Snowflake
- 4+ years of developing End to End Self-Serve BI platforms using BI tools like Tableau, Qlik Sense, Looker or like
- 2+ years of ETL knowledge and parallel processing technologies like Spark and Kafka streaming.
- 6 years of Programming experience with either Java or Python or Scala in a Linux/ Unix environment
- Minimum 1 year of Working knowledge with cloud-based solutions hosted on AWS.