Principal Platform Engineer - Kubernetes & Cloud Infrastructure

Ombud

$130K — $160K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 8+ years in platform, infrastructure, SRE, or DevOps roles.
  • 3+ years operating production Kubernetes at scale.
  • Extensive AWS expertise across key services.
  • Proficient in Terraform, Docker, and CI/CD systems.
  • Demonstrated impact through architectural decisions with measurable outcomes.
  • Strong written communication skills for technical documentation.
  • Willing to work in the Denver office three days a week.

Responsibilities

  • Own production Kubernetes (EKS) clusters, focusing on capacity planning and workload isolation.
  • Manage AWS infrastructure end-to-end, ensuring multi-region deployments and data integrity.
  • Implement Infrastructure-as-Code using Terraform and conduct peer reviews.
  • Develop CI/CD pipelines for reliable and secure builds.
  • Enhance observability using Grafana and Prometheus metrics, with on-call alerting.
  • Drive cost optimization initiatives to reduce AWS expenses significantly.
  • Lead architectural decisions for a self-service infrastructure that scales efficiently.

Benefits

  • Hybrid work arrangement in downtown Denver.
  • Opportunity to influence architectural direction as a senior individual contributor.
  • Exposure to cutting-edge generative AI workloads.
  • Engage in a dynamic startup environment with entrepreneurial spirit.
  • Professional growth through direct interaction with executive leadership.
Full Job Description
  • Location: Denver, CO (hybrid - Tue/Wed/Thu in office)
  • Reports to: CEO
The role

Ombud's platform runs production AI workloads for enterprise customers, and we're scaling toward a self-service motion where customers onboard, ingest content, and operate the product without manual implementation. That requires an infrastructure foundation that can handle multi-tenant scale, high reliability, and the unique demands of generative AI workloads - without ballooning the AWS bill.

We're hiring a Principal Platform Engineer to own that foundation. This is a senior individual contributor role with broad architectural authority. You will not have direct reports. You will set the technical direction for our cloud infrastructure, partner with engineering on production scaling decisions, and operate the platform with the discipline a SOC 2 / ISO 27001 customer base requires.
What you'll own
  • Production Kubernetes (EKS) clusters: capacity planning, node group strategy, gen-AI workload isolation, blast-radius containment.
  • AWS infrastructure end-to-end: RDS, DMS, Kafka (MSK), ECR, networking, IAM, multi-region deployments (including Ireland for EU data residency).
  • Infrastructure-as-code in Terraform - modules, environments, drift management, peer review.
  • CI/CD pipelines (Jenkins, GitHub Actions, or your recommended replacement) - fast, reliable, secure builds for backend and frontend services.
  • Observability: Grafana dashboards, Prometheus metrics, log pipelines, on-call alerting, SLO definition.
  • Cost optimization. AWS spend is one of our top three variable costs. Reducing it by 20% is a tangible objective for this seat.
  • Security posture: secrets management (Consul/Vault), IAM hygiene, vulnerability patching, support for SOC 2 and ISO 27001 audit cycles.
  • Architecture leadership on the self-service infrastructure roadmap: how we onboard a customer without human intervention and scale to 10x our current tenant count.
  • Documentation and runbooks that let the rest of the engineering team operate the platform when you're unavailable.
Must-haves
  • 8+ years of platform, infrastructure, SRE, or DevOps experience, with at least 3+ years operating production Kubernetes at scale.
  • Deep AWS expertise across compute, storage, networking, data services, and IAM.
  • Production fluency with Terraform, Docker, Linux, and CI/CD systems.
  • Track record of architectural decisions that materially improved reliability, cost, or developer velocity - with specific, measurable outcomes you can point to.
  • Comfort operating as a senior IC who sets technical direction across teams without formal authority.
  • Strong written communication - runbooks, architecture decision records, post-incident reviews.
  • Willingness to be in-office Tuesday through Thursday in Denver.
Nice-to-haves
  • Production experience supporting generative AI or ML workloads (GPU node groups, vector databases, model serving).
  • Experience with Qdrant, Pinecone, Weaviate, or other vector stores in production.
  • PostgreSQL operational depth - replication, performance tuning, backup/restore.
  • Experience scaling a multi-tenant SaaS platform from ~100 customers to ~1,000.
  • SOC 2 Type II and ISO 27001 audit experience.
  • Familiarity with event-driven architectures (Kafka, Kinesis, or equivalent).
What success looks like
First 30 days
  • Complete a written audit of our current infrastructure: what we have, where the risks are, what's costing us money.
  • Establish on-call rotation participation and respond to your first production incident.
  • Identify the top three architectural debt items.
First 60 days
  • Deliver first architectural recommendation with implementation plan - typically cost optimization or scaling bottleneck.
  • Refresh and own the observability stack.
  • Document the production runbook for the rest of the engineering team.
First 90 days
  • Ship a measurable improvement: cost reduction, reliability uplift, deployment velocity, or scale headroom.
  • Deliver the multi-tenant scale roadmap for the self-service motion.
  • Establish quarterly architecture review cadence with the engineering team.

Similar Jobs

More Jobs at Ombud

More Information Technology Jobs

Find similar Principal Platform Engineer - Kubernetes & Cloud Infrastructure jobs: