Project Overview:We're building a large-scale document intelligence platform that processes text files up to 5 TB in size, extracts insights using BERT-class NLP models, and surfaces answers to analysts via a low-latency query interface. The platform runs on Azure Kubernetes Service (AKS) with dedicated GPU node pools, uses KEDA for event-driven autoscaling, and integrates with Azure Data Lake Storage Gen2 and Azure OpenAI.
This is a hands-on role that sits at the intersection of platform engineering and applied ML, and requires someone who is equally comfortable debugging a CUDA out-of-memory error and designing a Kubernetes autoscaling policy. As the Senior ML Infrastructure Engineer the resource will own the end-to-end infrastructure layer - from GPU cluster configuration and CUDA runtime management to Kubernetes job orchestration and model serving.
Skill / Technology: - Level: Kubernetes / AKS
- Expert: Multi-node-pool design, taint/toleration, autoscaler, GPU node pools (NC/ND series)
- Senior: Device plugin, driver compat, resource limits, KEDA
- Senior: Scaled Job, queue triggers, cooldown tuning, CUDA / cuDNN
- Mid-Senior: Runtime config via PyTorch; raw kernel dev not required, PyTorch (GPU inference)
- Senior: Batching, FP16, memory management, profiling, Hugging Face Transformers
- Senior: BERT/DistilBERT/BGE loading, pipeline API, tokenization, Python (production)
- Senior: Async workers, Azure SDK, queue consumers, Azure infrastructure
- Senior: VNet, private endpoints, Key Vault, ADLS, AD, Docker / Helm
- Senior: Multi-stage builds, Helm chart authoring, IaC (Terraform / Bicep)
- Preferred: willingness to learn is acceptable