Engineer, Production Engineering

Guild.ai, Inc

$120K — $150K *
Technical Services
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years in Production Engineering, Platform Engineering, or security-focused infrastructure role
  • Strong hands-on experience with Kubernetes and GCP in production
  • Proficient in Terraform for infrastructure management
  • Strong programming skills in Python, Go, TypeScript, etc.
  • Experience with compliance frameworks like SOC2 and secure system design

Responsibilities

  • Manage and evolve production and staging infrastructure on GCP using Terraform
  • Deploy and operate within customer VPCs across AWS, Azure, and GCP
  • Build and maintain Kubernetes-based sandboxing for agent execution
  • Own observability stack with OpenTelemetry and integrations like New Relic and Splunk
  • Lead work for SOC2 compliance, including audits and control implementations
  • Manage HackerOne engagement for penetration testing and bug bounty
  • Design and maintain automated CI/CD workflows for deployment

Benefits

  • Hybrid/Onsite work model in the San Francisco Bay Area
  • Opportunity to work in an early-stage startup environment
  • Direct contribution to product development
  • High autonomy in decision-making and tool selection
  • Engagement with cutting-edge infrastructure for AI agents
Full Job Description
Engineer - Production Engineering

Location: San Francisco Bay Area (Hybrid/Onsite)
Type: Full-time
Stage: Early-stage startup
About the Role

We are building the control plane for AI agents in teams and companies.

As a Production Engineer, you will own the infrastructure, security, and compliance systems that allow our platform to ship fast and run reliably at scale. This is not a traditional ops role - you will write real code, contribute directly to the product, and own the full security and compliance surface of an early-stage company.

You'll work across Kubernetes infrastructure, cloud delivery, agent sandboxing, SOC2 compliance, IT systems, and production observability - and you'll contribute to the product itself, building security-sensitive features and auditing application code for vulnerabilities.

If you want to own the production backbone for the agent-native era - from a Terraform module to a pentest to an API key implementation - we want to talk.
What You'll Own

1. Cloud & Kubernetes Infrastructure
  • Our Stack: Manage and evolve our production and staging infrastructure on GCP (GKE) using Terraform. Own DNS, networking, and environment configuration end-to-end.
  • Customer Environments: Deploy and operate within customer VPCs across AWS, Azure, and GCP - adapting to varied infrastructure constraints, security requirements, and enterprise networking configurations.
  • Agent Sandboxing: Build and maintain Kubernetes-based sandboxing for agent execution - ensuring agents operate within strict network boundaries and must route through our API gateway rather than having unfettered internet access.
  • Observability: Own our observability stack, including OpenTelemetry instrumentation and integrations with New Relic and Splunk, to give the team deep visibility into system performance and agent runtime behavior.

2. Security, Compliance & IT
  • SOC2 & Audits: Lead infrastructure and operational work to support SOC2 compliance, including audit preparation, evidence collection, and control implementation.
  • Penetration Testing & Bug Bounty: Manage our HackerOne engagement - coordinating pentests, triaging incoming bug bounty reports, and driving remediation.
  • Product Security: Audit application code for security vulnerabilities, contribute security-sensitive product features (e.g., API key management), and ensure product and infrastructure security are coherent end-to-end.
  • IT & Identity: Own our IT stack - Okta, device management, and access controls - keeping the company secure as we scale.

3. CI/CD & Progressive Delivery
  • Deployment Pipelines: Design and maintain safe, automated CI/CD workflows supporting rollout strategies like canary and blue-green deployments.
  • Release Velocity: Make shipping to production a routine, boring, highly automated non-event.
What We're Looking For

Strong Fit
  • Experience: 5+ years in Production Engineering, Platform Engineering, or a security-focused infrastructure role, ideally at a fast-growing startup or SaaS company.
  • Our Stack: Strong hands-on experience with Kubernetes and GCP in production; comfortable with Terraform for managing real infrastructure.
  • Code over Click: Strong programming skills (Python, Go, TypeScript, etc.) with a passion for automating away toil.
  • Security Depth: Hands-on experience with compliance frameworks (SOC2), vulnerability management, and secure system design.

Bonus Points
  • Background with multi-tenant SaaS or enterprise security and procurement requirements.
  • Exposure to AI/ML infrastructure, particularly agent runtimes.
  • Experience building security-sensitive product features alongside infrastructure work.
  • Experience supporting pentests / bug bounties
  • Experience deploying and operating in customer VPCs or other external cloud environments across AWS, Azure, and/or GCP - navigating enterprise networking, security, and access constraints.
Why This Role is Unique
  • Broad Ownership: You'll own the full security and compliance surface of an early-stage company - from SOC2 to sandboxed agent execution to IT - while also contributing directly to the product.
  • Agent Infrastructure: You'll design infrastructure for autonomous AI agents, not just traditional web services - introducing unique sandboxing, observability, and security challenges.
  • Our Infra and Theirs: You'll operate across both our own production environment and customer cloud environments, requiring you to be fluent across AWS, Azure, and GCP.
  • High Autonomy: As an early hire, you'll have a seat at the table to choose the tools and define the architecture that carries us to scale.
Who Thrives Here
  • Engineers who are as comfortable reading application code for vulnerabilities as they are writing a Terraform module.
  • People who enjoy owning the full security and compliance surface, not just one layer of it.
  • Builders who can navigate the constraints of customer enterprise environments without losing velocity.
  • Those who are energized - not overwhelmed - by the breadth of an early-stage technical operations role.

Similar Jobs

More Jobs at Guild.ai, Inc

  • Forward Deployed Engineer (FDE)
    $120K — $150K *
    San Francisco, CA 94112 (San Francisco County)
    Information Technology
    In-Person
  • Brand & Marketing Designer
    $90K — $130K *
    San Francisco, CA 94112 (San Francisco County)
    Consumer Technology
    In-Person
  • Product Manager
    $120K — $160K *
    San Francisco, CA 94112 (San Francisco County)
    Enterprise Technology
    In-Person
  • AI Engineer, Production Agents
    $130K — $180K *
    San Francisco, CA 94112 (San Francisco County)
    Information Technology
    In-Person
  • Software Engineer - Agent Control Plane
    $130K — $180K *
    San Francisco, CA 94112 (San Francisco County)
    Information Technology
    In-Person

More Technical Services Jobs

Find similar Engineer, Production Engineering jobs: