Position Overview
Siemens Digital Industries Software is looking for a talented Data Scientist/Engineer to be part of the Data and Analytics team. The ideal candidate will have proven experience and passion for driving data science projects to generate impact and value to the company.
Responsibilities
As a Data Scientist/Engineer on the Data Analytics & Automation team (DAA), your primary responsibilities will be:
Detailed responsibilities:
- Build production grade models on large-scale datasets to optimize marketing performance by utilizing advanced statistical modeling, machine learning, or data mining techniques and marketing science research
- Work with data engineering team to create data pipelines
- Work with operations team to create data integration pipelines between marketing systems on our data lake strategy
- Assist and collaborate with other data analysts and scientists to generate value from data
- Build backend data services for internal applications teams to consume
- Build tools and frameworks to facilitate efficient and reliable data processing
- Enable data science models including model building, scripting, data preparation and model management.
- Create engineered feature sets for training ML models
- Build classification, regression, and clustering models using Python ML libraries and Automated ML software tools
- Identify and define new AI and ML approaches to driving better business performance
Required Knowledge/Skills, Education, and Experience
- Bachelor of Science in computer science or related area, or equivalent experience
- Experience working with customer data platforms
- Java, or Scala for data processing
- Working experience with analytical databases, such as Snowflake or Redshift, as well as database and query optimization
- Highly Proficient with SQL
- Experience with relational SQL and NoSQL databases
- Familiar with BI visualization software like Tableau or Qlik
- Hands-on experience with versioning, continuous integration, and build and deployment tools and platforms such as Github, GitLab, and Circle CI
- Knowledge of data pipelining and workflow management tools such as Airflow, AWS Data Pipelines, and Luigi
- Familiar with dimensional data modeling and data normalization
- Familiar with data-lake architecture and data serialization formats such as JSON, Avro, and Parquet
- Practical experience building, validating, and deploying ML based predictive or clustering models
Bonus Points:
- Experience with AWS cloud services: Kinesis, Redshift, SNS, EC2, EMR Spark, EMR Hive, RDS, Athena, Spectrum, DynamoDB, and AWS Glue
- Spark Streaming, Kafka, Kinesis, Spark Streaming, and Terraform
- Hands-on experience with data stream processing