100% remote
Description:
Principal Data Architect / Engineer (Cloud-Agnostic & Governance Lead)
The Role
We are seeking a Principal Data Architect / Engineer to serve as the highest-level technical authority and strategic influencer within our Enterprise Data Infrastructure team. This is a high-visibility, high-impact role designed to sit above our current senior engineering tier. You will act as the critical bridge between long-term business strategy and tactical technical execution, directly influencing how our enterprise handles petabyte-scale data processing for 2026 and beyond.
In this role, you will lead the architectural evolution of our data ecosystem. Our immediate interim roadmap involves migrating from a legacy, scheduled Redshift environment to a modern, decoupled Lakehouse architecture. However, a primary focus of this role will be ensuring our current platform and future looking plans are well executed and planned for future AI and LLM use-cases. You will design vendor-agnostic frameworks (utilizing open table formats like Apache Iceberg) to ensure seamless portability and meet evolving business requirements while minimizing vendor lock-in.
Beyond technical vision, a massive component of this role is governance and mentorship. You will establish automated guardrails and DataOps pipelines to scale engineering quality across a combination of on-shore architects and engineers, and off-shore developers and operations support personnel.
Key Responsibilities
1. Strategic Architecture & Future-Proofing
Cloud-Agnostic Vision: Design and execute a 3-to-5-year enterprise data warehouse modernization roadmap that prioritizes extreme data portability, decoupling storage from compute using open table formats (e.g., Apache Iceberg).
Multi-Cloud Readiness: Evaluate alternative ecosystems and ensure current AWS implementations are built with vendor-neutral patterns, avoiding proprietary lock-in.
AI & Semantic Preparation: Establish strict semantic conventions, data hygiene standards, and metadata graphs today to ensure the underlying data warehouse is optimized for future conversational AI agents and modern BI tools.
2. Mentorship & Leadership
Talent Elevation: Act as a dedicated mentor to the 3 intermediate and 3 senior onshore engineers, pushing them to think in terms of holistic system design and enterprise scalability.
Soft Skills Coaching: Help senior staff develop critical soft skills, including "influence without authority," cross-functional stakeholder communication, and effective offshore vendor governance.
3. Engineering Governance & Quality Automation
Scale Through Guardrails: Stop technical debt at the source by designing reusable framework templates and abstracted wrappers that enforce coding best practices across an offshore development team
Automated DataOps Gates: Implement automated CI/CD quality gates (e.g., automated SQL linting, schema drift detection, and data validation frameworks) to catch low-quality code before it reaches human review.
Operational Excellence: Redefine the onshore team's workflows, shifting senior engineers from manual code-reviewers to platform product owners.
Code modernization: Assist in setting policies and training for the team to move from working exclusively in SQL to also including Spark jobs. Help us figure out the proper handoff mechanisms and frameworks for evaluating SQL developed solutions to Spark jobs.
4. Ingestion & Pipeline Optimization
High-Frequency Ingestion: Own the architectural pattern for high-frequency source replication, ensuring optimal performance without violating API limits or driving unnecessary cloud compute costs.
Modern Orchestration & Storage: Guide the transition from rigid, scheduled jobs to event-driven processing using AWS Glue, EMR Serverless, and optimized presentation layers in Redshift.
Required Qualifications
8+ years of deep experience in enterprise data engineering, data architecture, or data platform roles, with at least 2+ years operating at a Staff, Principal, or Lead Architect level within a Fortune 500 scale environment.
Expertise in Cloud-Agnostic Design: Proven track record of building data lakehouses centered around open table formats (e.g., Apache Iceberg, Delta Lake) to ensure cross-cloud compatibility.
Advanced AWS Data Ecosystem Experience: Hands-on architectural experience with AWS Glue, EMR Serverless, S3 data lakes, and Amazon Redshift.
Governance at Scale: Demonstrated success implementing automated testing, CI/CD pipelines, and DataOps frameworks (e.g., dbt, Great Expectations, SQLFluff) to govern large delivery teams.
Complex Ingestion Mastery: Experience managing high-frequency CDC data ingestion from massive enterprise ERP systems (specifically Oracle Fusion Cloud or equivalent) using modern tools like Fivetran.
Expert Coding Skills: Mastery of Python and complex SQL, with a deep understanding of query optimization, data modeling (Kimball, Data Vault 2.0), and technical debt remediation.
Infrastructure as Code (IaC): Familiarity with Terraform, CloudFormation, or containerization (Docker, Kubernetes) to standardize environment promotion.
Preferred Qualifications
Multi-Cloud Fluency: Experience architecting or migrating workloads within Google Cloud Platform (BigQuery) or Snowflake.
AI/Semantic Layer Enablement: Conceptual or practical experience designing data models to be natively consumed by semantic layers (Looker/LookML) and LLM-driven analytics agents.
Strong Communication: Exceptional executive presence; ability to justify technical trade-offs to senior business leaders and technical committees alike.
Success Indicators
Automated Code Quality: Drastic reduction in production data pipeline failures and technical debt through the implementation of automated offshore code guardrails.
Architectural Agility: Successful implementation of an AWS Lakehouse that allows data to be queried natively by outside platforms (like GCP or Snowflake) without requiring a data migration.
Onshore Team Velocity: Transition of the onshore senior engineering team from tactical "firefighting" to strategic platform enablement.
AI Readiness Score: Delivery of clean, well-cataloged semantic data models ready for conversational AI deployment.
Notes:
Hybrid Optional Preferred
100% remote