About the RoleYou'll be GC AI's first dedicated data hire, which means you'll own the entire data stack from day one. Right now, our data lives in multiple systems: product usage, CRM, billing, customer data, user analytics. Your first job is to consolidate all of it into a single, well-modeled warehouse in BigQuery so the company can actually use it.
From there, you'll build the infrastructure that makes data accessible to everyone. Think less "build dashboards for the exec team" and more "build the internal data agent that lets anyone in the company ask a question and get an answer." If you've seen what Vercel did with d0 (their text-to-SQL agent on top of their warehouse), that's the direction. We want someone who reaches for code, GCP primitives, and simple maintainable systems before defaulting to expensive enterprise platforms.
Longer term, you'll build toward a data lake that supports personalization and fine-tuning for the GC AI platform. You'll work closely with Product and Engineering to make sure the data infrastructure serves both internal operations and the product itself.
This is a founding role. You'll set the patterns, choose the tools, and eventually build the team around you.
What You'll Do- Take ownership of the data warehouse in BigQuery: modeling, pipeline development, data quality, and performance.
- Build pipelines that consolidate product usage data, CRM data, billing, customer contract data, and user analytics into a single source of truth.
- Design and build internal data tools using applied AI, including natural-language query interfaces and automated reporting, so the rest of the company can self-serve without waiting on an analyst.
- Set up the warehouse so business teams can run their own queries and pull their own numbers without filing a ticket.
- Build toward a data lake architecture that supports personalization and model fine-tuning for the GC AI product.
- Keep the stack lean. Use what's available in BigQuery and the broader GCP ecosystem and make smart decisions to reduce complexity and cost without introducing tool sprawl.
- Define data engineering practices, tooling, and standards as the first hire on what will become a team.
What We Value- Builder instinct. Your default response to a data problem is to write code and build infrastructure, not to evaluate vendors and sign contracts.
- Applied AI fluency. You're comfortable using LLMs and AI tooling to solve data problems: building agents that query warehouses, automating data quality checks, generating reports. You don't need to train models, but you should know how to put them to work.
- Simplicity bias. You'd rather build a clean, maintainable pipeline with standard tools than an over-engineered stack that requires a team of five to operate.
- Ownership. You treat the data warehouse like a product: it should be reliable, well-documented, and useful to the people who depend on it.
What We Require- 5+ years of experience in data engineering, with hands-on experience building and maintaining data warehouses and pipelines.
- Strong SQL skills and deep experience with BigQuery or comparable analytical databases.
- Proficiency in Python for pipeline development, scripting, and tooling.
- Experience building ETL/ELT pipelines that consolidate data from multiple source systems (SaaS APIs, event streams, databases).
- Experience working within GCP or a comparable cloud ecosystem.
- Ability to design data models that are clean, performant, and usable by non-engineers.
Nice to Have- Experience building internal data tools or agents using LLMs (text-to-SQL, natural language interfaces, automated reporting). This is a strong differentiator.
- Experience as the first or early data hire at a startup, where you owned the full stack.
- Familiarity with legaltech, legal operations, or SaaS product analytics.
- Experience setting up self-serve analytics layers (semantic layers, BI tool configuration, data documentation).
- Experience with data infrastructure that supports ML workflows (feature stores, training data pipelines, data lakes).
- Experience with infrastructure as code, especially Terraform, for managing GCP data infrastructure.
Location PolicyThis is a remote role unless you fall within the following parameters. If you live within approximately 50 miles of our San Mateo, CA or Provo, UT office, the position follows a hybrid schedule with in-office days on Tuesdays, Wednesdays, and Thursdays.