Principal Platform Engineer - Kubernetes & Cloud Infrastructure

Ombud

• $130K — $160K *

Denver, CO 80219In-Person

Information Technology

8 - 10 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

8+ years in platform, infrastructure, SRE, or DevOps roles.
3+ years operating production Kubernetes at scale.
Extensive AWS expertise across key services.
Proficient in Terraform, Docker, and CI/CD systems.
Demonstrated impact through architectural decisions with measurable outcomes.
Strong written communication skills for technical documentation.
Willing to work in the Denver office three days a week.

Responsibilities

Own production Kubernetes (EKS) clusters, focusing on capacity planning and workload isolation.
Manage AWS infrastructure end-to-end, ensuring multi-region deployments and data integrity.
Implement Infrastructure-as-Code using Terraform and conduct peer reviews.
Develop CI/CD pipelines for reliable and secure builds.
Enhance observability using Grafana and Prometheus metrics, with on-call alerting.
Drive cost optimization initiatives to reduce AWS expenses significantly.
Lead architectural decisions for a self-service infrastructure that scales efficiently.

Benefits

Hybrid work arrangement in downtown Denver.
Opportunity to influence architectural direction as a senior individual contributor.
Exposure to cutting-edge generative AI workloads.
Engage in a dynamic startup environment with entrepreneurial spirit.
Professional growth through direct interaction with executive leadership.

Full Job Description

Location: Denver, CO (hybrid - Tue/Wed/Thu in office)
Reports to: CEO

The role

Ombud's platform runs production AI workloads for enterprise customers, and we're scaling toward a self-service motion where customers onboard, ingest content, and operate the product without manual implementation. That requires an infrastructure foundation that can handle multi-tenant scale, high reliability, and the unique demands of generative AI workloads - without ballooning the AWS bill.

We're hiring a Principal Platform Engineer to own that foundation. This is a senior individual contributor role with broad architectural authority. You will not have direct reports. You will set the technical direction for our cloud infrastructure, partner with engineering on production scaling decisions, and operate the platform with the discipline a SOC 2 / ISO 27001 customer base requires.
What you'll own

Production Kubernetes (EKS) clusters: capacity planning, node group strategy, gen-AI workload isolation, blast-radius containment.
AWS infrastructure end-to-end: RDS, DMS, Kafka (MSK), ECR, networking, IAM, multi-region deployments (including Ireland for EU data residency).
Infrastructure-as-code in Terraform - modules, environments, drift management, peer review.
CI/CD pipelines (Jenkins, GitHub Actions, or your recommended replacement) - fast, reliable, secure builds for backend and frontend services.
Observability: Grafana dashboards, Prometheus metrics, log pipelines, on-call alerting, SLO definition.
Cost optimization. AWS spend is one of our top three variable costs. Reducing it by 20% is a tangible objective for this seat.
Security posture: secrets management (Consul/Vault), IAM hygiene, vulnerability patching, support for SOC 2 and ISO 27001 audit cycles.
Architecture leadership on the self-service infrastructure roadmap: how we onboard a customer without human intervention and scale to 10x our current tenant count.
Documentation and runbooks that let the rest of the engineering team operate the platform when you're unavailable.

Must-haves

8+ years of platform, infrastructure, SRE, or DevOps experience, with at least 3+ years operating production Kubernetes at scale.
Deep AWS expertise across compute, storage, networking, data services, and IAM.
Production fluency with Terraform, Docker, Linux, and CI/CD systems.
Track record of architectural decisions that materially improved reliability, cost, or developer velocity - with specific, measurable outcomes you can point to.
Comfort operating as a senior IC who sets technical direction across teams without formal authority.
Strong written communication - runbooks, architecture decision records, post-incident reviews.
Willingness to be in-office Tuesday through Thursday in Denver.

Nice-to-haves

Production experience supporting generative AI or ML workloads (GPU node groups, vector databases, model serving).
Experience with Qdrant, Pinecone, Weaviate, or other vector stores in production.
PostgreSQL operational depth - replication, performance tuning, backup/restore.
Experience scaling a multi-tenant SaaS platform from ~100 customers to ~1,000.
SOC 2 Type II and ISO 27001 audit experience.
Familiarity with event-driven architectures (Kafka, Kinesis, or equivalent).

What success looks like
First 30 days

Complete a written audit of our current infrastructure: what we have, where the risks are, what's costing us money.
Establish on-call rotation participation and respond to your first production incident.
Identify the top three architectural debt items.

First 60 days

Deliver first architectural recommendation with implementation plan - typically cost optimization or scaling bottleneck.
Refresh and own the observability stack.
Document the production runbook for the rest of the engineering team.

First 90 days

Ship a measurable improvement: cost reduction, reliability uplift, deployment velocity, or scale headroom.
Deliver the multi-tenant scale roadmap for the self-service motion.
Establish quarterly architecture review cadence with the engineering team.

* Ladders Estimates

Similar Jobs

Cloud Infrastructure Engineer (Secret)
$102K — $149K *
Maxar Technologies
Longmont, CO 80504 (Weld County)
Today
Cloud Infrastructure Manager (Hybrid)
$105K — $145K *
State of Colorado
Denver, CO 80219 (Denver County)
Today
Senior Azure Cloud Platform Engineer
$81K — $151K *
Bank of Montreal
Remote
Reposted Yesterday
Azure Cloud Engineer IV
$140K — $175K *
Hanger
Remote
Reposted 2 days ago
Amazon Connect Architect
$120K — $150K *
Miratech
Remote
2 days ago
Senior Microsoft Cloud Engineer - Data Sharing & B2B
$121K — $182K *
AIS
Remote
2 days ago

Get Ready For Your
Next Interview

More Jobs at Ombud

Principal Platform Engineer - Kubernetes & Cloud Infrastructure
$130K — $160K *
Denver, CO 80219 (Denver County)
Today
Information Technology
In-Person
Sales Product Specialist - AI Solutions
$90K — $120K *
Denver, CO 80219 (Denver County)
Today
Enterprise Technology
In-Person
Commercial Account Executive - AI Solutions
$80K — $120K *
Denver, CO 80219 (Denver County)
Today
Enterprise Technology
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
Today
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
1 week ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Site Reliability Engineer
$140K — $155K *
Axle Informatics
Frederick, MD 21702 (Frederick County)
Today
Software Engineer
$120K — $160K *
Heliux, Inc.
San Francisco, CA 94112 (San Francisco County)
Reposted Today

Find similar Principal Platform Engineer - Kubernetes & Cloud Infrastructure jobs:

Nationwide Denver, CO

Principal Platform Engineer - Kubernetes & Cloud Infrastructure

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Principal Platform Engineer - Kubernetes & Cloud Infrastructure jobs:

Get Ready For Your
Next Interview