Proven expertise with Terraform/Pulumi, IaC, policy-as-code, and scripting (Python, Bash, PowerShell)
Hands-on GPU compute provisioning across major cloud and specialized providers
Experience with Cloudflare or equivalent CDN/WAF/DDoS platforms for perimeter security and Zero Trust
Strong background in AWS, Azure, GCP, and on-prem infrastructure with secure architecture focus
Proficiency in Kubernetes and Docker, including container security, GPU scheduling, and runtime protection
Deep understanding of network security, zero-trust principles, IAM/RBAC, and secrets management
CI/CD experience with integrated security scanning
Ability to conduct security assessments, threat modeling, and work directly with customers
Responsibilities
Provision and optimize GPU compute across diverse cloud platforms and specialized providers
Design and maintain IaC foundations for advanced AI systems
Implement policy-as-code guardrails for autonomous agent operations
Design and enforce zero-trust architectures and manage IAM/RBAC
Configure and manage Cloudflare for comprehensive security measures
Manage DNS security and API security controls
Lead vulnerability management and coordinate penetration testing
Build and maintain CI/CD pipelines with integrated security scanning
Deploy and manage secure, GPU-enabled Kubernetes clusters
Implement observability and SIEM integrations
Lead incident response and ensure compliance through audit automation
Benefits
Opportunity to work with cutting-edge AI technology
Exposure to a wide variety of cloud and specialized environments
Strong focus on security practices and compliance
Collaboration with customer success teams for hands-on impact
Engagement in continuous learning through advanced tooling and frameworks
Full Job Description
What you'll do
Provision and optimize GPU compute across AWS, Azure, GCP, and specialized providers (CoreWeave, Lambda Labs), including Kubernetes GPU orchestration and hardware evaluation (NVIDIA H100/B200, AMD MI300X, Intel Gaudi)
Design and maintain IaC foundations (Terraform, Pulumi, Helm) for agentic AI systems, including agent orchestration platforms, RAG stacks, vector databases, and model serving endpoints
Implement policy-as-code guardrails (OPA, Sentinel, Kyverno) for autonomous agent workloads
Design and enforce zero-trust architectures with network segmentation, IAM/RBAC least-privilege, and secrets management (Vault, AWS Secrets Manager)
Configure and manage Cloudflare (or equivalent) for DDoS protection, WAF, bot management, SSL/TLS termination, and Zero Trust access
Manage DNS security (DNSSEC, DMARC, SPF, DKIM), certificate lifecycle, and API security controls (mTLS, token management)
Lead vulnerability management, penetration testing coordination, and CIS benchmarking
Partner with customer success teams to assess, secure, and threat-model customer deployment environments
Build and maintain CI/CD pipelines (GitHub Actions, GitLab CI) with integrated security scanning (SAST, DAST, SCA, container scanning)
Deploy and manage Kubernetes clusters across cloud and on-prem with security-hardened, GPU-enabled configurations
Implement observability (Prometheus, Grafana, Splunk, Datadog) and SIEM integrations
Lead incident response and drive compliance (SOC 2, ISO 27001, HIPAA, FedRAMP) through audit automation
Qualifications
Proven expertise with Terraform/Pulumi, IaC, policy-as-code, and scripting (Python, Bash, PowerShell)
Hands-on GPU compute provisioning across major cloud and specialized providers
Experience with Cloudflare or equivalent CDN/WAF/DDoS platforms for perimeter security and Zero Trust
Strong background in AWS, Azure, GCP, and on-prem infrastructure with secure architecture focus
Proficiency in Kubernetes and Docker, including container security, GPU scheduling, and runtime protection
Deep understanding of network security, zero-trust principles, IAM/RBAC, and secrets management
CI/CD experience with integrated security scanning
Ability to conduct security assessments, threat modeling, and work directly with customers