Job Title: Big Data Engineer
Job Summary: We are seeking a highly skilled Big Data Engineer to design, develop, and maintain scalable data processing systems capable of handling large volumes of structured, semi-structured, and unstructured data. The ideal candidate will build and optimize big data pipelines, data lakes, and distributed data platforms to support analytics, machine learning, and business intelligence initiatives. This role requires expertise in modern big data technologies, cloud platforms, and data engineering best practices.
Key Responsibilities: - Design, develop, and maintain large-scale data processing and analytics platforms.
- Build and optimize data pipelines for batch and real-time data processing.
- Develop and manage data lakes, data warehouses, and distributed storage systems.
- Integrate data from multiple internal and external data sources.
- Ensure data quality, consistency, security, and governance across data platforms.
- Collaborate with Data Scientists, Data Analysts, ML Engineers, and Business Stakeholders to understand data requirements.
- Optimize data processing performance and scalability.
- Monitor and troubleshoot big data infrastructure and workflows.
- Implement data ingestion, transformation, and aggregation processes.
- Support cloud migration and modernization initiatives.
- Develop technical documentation and data architecture standards.
- Stay current with emerging big data technologies and industry trends.
Required Skills: - Strong understanding of Big Data ecosystems and distributed computing concepts.
- Experience designing and implementing scalable data pipelines.
- Knowledge of data modeling, ETL/ELT processes, and data warehousing.
- Strong SQL and data analysis skills.
- Experience with real-time and batch processing frameworks.
- Excellent problem-solving and troubleshooting abilities.
Technical Skills: - Big Data Technologies: Hadoop, Spark, Hive, HBase, Kafka
- Data Processing: Apache Spark, Apache Flink, Apache Beam
- Data Warehousing: Snowflake, Redshift, BigQuery, Azure Synapse
- Databases: PostgreSQL, MySQL, MongoDB, Cassandra
- ETL Tools: Informatica, Talend, AWS Glue, Azure Data Factory
- Cloud Platforms: AWS, Microsoft Azure, Google Cloud Platform (GCP)
- Programming Languages: Python, Java, Scala, SQL
- Workflow Orchestration: Apache Airflow, Luigi
- Containerization: Docker, Kubernetes
- Version Control: Git, GitHub, GitLab, Bitbucket
Qualifications: - Bachelor's degree in Computer Science, Information Technology, Data Engineering, Software Engineering, or a related field.
- Master's degree in Data Science or a related field is a plus.
- Relevant certifications are preferred:
- AWS Certified Data Engineer
- Google Professional Data Engineer
- Microsoft Azure Data Engineer Associate
- Databricks Certified Data Engineer
Experience: - 4-8 years of experience in Data Engineering, Big Data Engineering, or Data Platform Development.
- Hands-on experience with distributed data processing frameworks such as Apache Spark.
- Experience working with cloud-based data platforms and data warehouses.
- Experience handling large-scale data processing and analytics workloads.
Preferred Qualifications: - Experience with Data Lakehouse architectures using Databricks, Delta Lake, or Apache Iceberg.
- Knowledge of streaming technologies such as Kafka, Kinesis, or Pub/Sub.
- Experience supporting Machine Learning and AI data pipelines.
- Familiarity with Data Governance, Data Quality, and Master Data Management (MDM).
- Experience with DataOps and CI/CD practices for data engineering.
Preferred Qualities: - Strong analytical and problem-solving mindset.
- Ability to work with large and complex datasets.
- Excellent communication and collaboration skills.
- Strong attention to detail and data accuracy.
- Passion for emerging data technologies and innovation.
Employment Type: Full-Time
Location: Remote / Hybrid / On-site
Nice to Have: - Experience with Databricks, Snowflake, Delta Lake, Apache Iceberg, or Apache Hudi.
- Knowledge of Generative AI, Machine Learning, and MLOps data infrastructure.
- Experience building real-time analytics and event-driven architectures.
- Familiarity with FinTech, Healthcare, Retail, HR Tech, SaaS, or E-commerce data platforms.
- Experience mentoring junior engineers and contributing to enterprise data architecture and strategy.