Req ID: 372653
We are currently seeking a Platform Engineer (AI/LLM Infrastructure) to join our team in Santa Clara, California (US-CA), United States (US).
We are currently seeking a
Platform Engineer (AI/LLM Infrastructure) to join our team in
Santa Clara, CA.
- Lead the design, implementation, and operation of scalable infrastructure platforms supporting AI/LLM-based solutions for enterprise clients.
- Act as a hands-on technical lead (player-coach), contributing to development while guiding a team of engineers.
- Own end-to-end infrastructure architecture below the application layer, including compute, container orchestration, CI/CD, observability, and security.
- Partner directly with clients and stakeholders to design, present, and deliver robust AI infrastructure solutions.
- Architect and manage production-grade Kubernetes environments (AKS/EKS), including cluster operations and RBAC.
- Design and operationalize RAG pipelines, including ingestion, chunking, embedding workflows, and vector database management.
- Lead GPU infrastructure provisioning and optimization (NVIDIA A100/H100 or similar).
- Drive Infrastructure-as-Code adoption using Terraform and GitOps practices (ArgoCD/Flux).
- Build and maintain CI/CD pipelines using GitHub Actions and Azure DevOps.
- Establish observability standards using Datadog, OpenTelemetry, and ELK/OpenSearch.
- Lead incident response, on-call processes, and post-mortem analysis.
- Ensure strong security posture and lead InfoSec review processes.
- Coordinate delivery across multiple teams and client engagements.
Qualifications:- 5+ years of experience in Platform Engineering, SRE, or Infrastructure Engineering.
- 3+ years of experience delivering and leading infrastructure for AI/LLM-based production systems.
- 3+ years of experience with Terraform and GitOps (ArgoCD/Flux).
- 3+ years of experience with Azure (Key Vault, Monitor, DevOps Pipelines).
- 3+ years of Experience with CI/CD and container registry management.
Position can pay between 130-170K (USD) range annually depending on skills match & suitability.