Principal Data EngineerAbout the RoleWe are seeking a Principal Data Engineer to lead the design, development, and optimization of our cloud-native data platform. You will be responsible for architecting scalable ETL pipelines, mentoring engineers, and driving technical decisions that shape our data infrastructure. This is a hands-on leadership role requiring deep expertise in AWS data services, distributed computing, and modern data lakehouse architectures.
ResponsibilitiesTechnical Leadership- Architect and evolve our medallion-based data lakehouse (Bronze/Silver/Gold tiers) on AWS
- Design and implement data transformation pipelines that scale to handle petabytes of data
- Establish best practices for data modeling, including dimensional modeling (Fact/Dimension tables) and slowly changing dimensions
- Define and enforce data quality, governance, and security standards across the platform
- Lead technical design reviews and provide guidance on complex engineering challenges
Hands-On Engineering- Build and maintain production-grade ETL pipelines using AWS Glue, PySpark, and Apache Iceberg
- Develop reusable Python libraries and frameworks for data processing and transformation
- Implement data lineage tracking and query optimization strategies
- Design event-driven data architectures using Step Functions, Lambda, and SQS
- Optimize Spark jobs for performance, cost efficiency, and reliability
Collaboration & Mentorship- Mentor and coach data engineers, fostering a culture of technical excellence
- Partner with Data Scientists, Analytics Engineers, and Product teams to understand data requirements
- Collaborate with Platform and DevOps teams on CI/CD, observability, and infrastructure automation
- Contribute to architectural decisions and technical roadmap planning
Required QualificationsExperience- 8+ years of experience in data engineering, with 3+ years in a senior or lead capacity
- Proven track record of designing and operating large-scale data platforms in production
- Experience leading technical projects and mentoring engineers
AWS Data Engineering ExpertiseProficiency with AWS data engineering services including but not limited to:
- Data Movement & Integration: DMS (Database Migration Service), SQS, Lambda
- Data Processing: AWS Glue, EMR, Step Functions
- Data Storage: S3, DynamoDB, Redshift
- Governance & Observability: DataZone, CloudWatch, CloudTrail
Technical Skills- Expert-level proficiency in Python and SQL (Spark SQL, T-SQL, or similar)
- Deep experience with Apache Spark (PySpark) for distributed data processing
- Strong knowledge of data lake table formats: Apache Iceberg, Delta Lake, or Apache Hudi
- Proficiency with dimensional modeling and data warehouse design patterns
- Experience with infrastructure as code and CI/CD pipelines (GitHub Actions, Terraform, or CloudFormation)
- Familiarity with data serialization formats (Parquet, Avro, JSON)
Architecture & Design- Experience designing medallion architectures or similar tiered data processing patterns
- Understanding of CDC (Change Data Capture) patterns and event-driven architectures
- Knowledge of data lineage, cataloging, and metadata management
- Experience implementing row-level security and data access controls
Preferred Qualifications- Experience with observability frameworks such as OpenTelemetry
- Familiarity with data validation libraries (Pydantic, Great Expectations)
- Experience with async Python (asyncio, aioboto3) for high-throughput applications
- Knowledge of Kubernetes and containerized workloads
- Experience with data mesh or data product architectures
- Background in legal, financial, or enterprise SaaS domains
Technical EnvironmentYou will work with:
- Languages: Python, SQL, Spark SQL
- Compute: AWS Glue 5.0, Lambda, Step Functions
- Storage: S3, DynamoDB, Redshift Serverless
- Formats: Apache Iceberg, Parquet, JSON
- Orchestration: AWS Step Functions, EventBridge
- CI/CD: GitHub Actions, multi-environment deployments
- Observability: CloudWatch, OpenTelemetry, custom metrics pipelines