Responsibilities:- Design and architect scalable, secure, and high-performance data platforms on GCP
- Lead the development of large-scale batch and streaming data pipelines using PySpark and Python
- Build and optimize orchestration frameworks using Apache Airflow/Cloud Composer
- Design enterprise-grade data models and transformation frameworks using dbt
- Implement scalable data warehouse and lakehouse solutions using BigQuery
- Define architecture standards, engineering best practices, and governance frameworks
- Drive performance optimization, scalability, and cloud cost optimization initiatives
- Collaborate with Architects, Product Owners, Analysts, and business stakeholders to translate business requirements into technical solutions
- Lead technical design discussions, code reviews, and architecture reviews
- Implement observability, monitoring, lineage, and data quality frameworks
- Mentor junior engineers and provide technical leadership to the team
- Ensure compliance with security, governance, and regulatory standards
- Support CI/CD and Infrastructure as Code implementations
Required Skills:- 6-9 years of experience in Data Engineering
- Strong expertise in GCP data ecosystem and cloud-native architectures
- Deep hands-on experience with:
- Python
- PySpark
- SQL
- Apache Airflow / Cloud Composer
- dbt
- BigQuery
- Strong understanding of data architecture patterns including:
- Lakehouse Architecture
- Medallion Architecture
- Data Mesh concepts
- Batch and Streaming architectures
- Experience with GCP services such as:
- BigQuery
- Dataproc
- Cloud Storage
- Pub/Sub
- Dataflow
- Composer
- IAM
- Expertise in data modeling, partitioning, clustering, and query optimization
- Strong understanding of distributed systems and scalable data processing
- Experience with CI/CD, GitOps, Terraform, and containerization technologies
- Excellent stakeholder management and technical leadership skills
Required Skills
gcp BigQuery data engineering