JOB SUMMARY
Seeking a Data Engineer with Spark & Streaming skills to build real-time, scalable data pipelines using tools like Spark, Kafka, and cloud services (GCP) to ingest, transform, and deliver data for analytics and ML.
Key Responsibilities
• Design, develop, and maintain ETL/ELT data pipelines for batch and real-time data ingestion, transformation, and loading using Spark (PySpark/Scala) and streaming technologies (Kafka, Flink).
• Build and optimize scalable data architectures, including data lakes, data warehouses (BigQuery), and streaming platforms.
• Optimize Spark jobs, SQL queries, and data processing workflows for speed, efficiency, and cost-effectiveness.
• Implement data quality checks, monitoring, and alerting systems to ensure data accuracy and consistency.
Required Qualifications
• Total IT Experience: Minimum 8 years.
• Scala: Minimum 2 years of experience.
• GCP: 4+ years of recent GCP experience.
• Programming: Strong proficiency in Python, SQL.
• Big Data: Expertise in Apache Spark (Spark SQL, DataFrames, Streaming).
• Streaming: Experience with messaging queues like Apache Kafka, or Pub/Sub.
• Cloud: Familiarity with GCP, Azure data services.
• Databases: Knowledge of data warehousing (Snowflake, Redshift) and NoSQL databases.
Preferred Qualifications
• Programming: Proficiency in Scala/Java.
• Tools: Experience with Airflow, Databricks, Docker, Kubernetes.
Certifications