Join VultrVultr is seeking a highly skilled and experienced Staff AI/ML Infrastructure Engineer to drive the design, performance, and reliability of our AI infrastructure platform. The ideal candidate is a hands-on infrastructure expert with deep GPU systems knowledge, strong automation experience, and a track record of technical leadership in high-performance environments. This is a highly visible role in a high-growth technology company, requiring ownership of complex hardware and software systems, collaboration across engineering and vendor partners, and a relentless focus on operational excellence. This is your opportunity to build the foundation powering next-generation AI workloads and leave a lasting mark on Vultr and the future of cloud infrastructure.
Key Responsibilities- Design and maintain GPU and bare metal infrastructure in containerized and physical environments
- Build scalable GPU clusters in partnership with networking and provisioning teams
- Ensure reliable, high-performance provisioning of GPU infrastructure
- Develop automated testing systems for GPU-based platforms
- Implement infrastructure solutions for diverse AI/ML workloads
- Benchmark, test, and troubleshoot GPU performance at scale
- Collaborate with hardware vendors on drivers, firmware, and support
- Resolve hardware, software, and performance issues across environments
- Optimize rail and cluster performance across architectures
- Lead technical direction and mentor engineers on infrastructure best practices
Qualifications- 5+ years experience working with bare metal infrastructure and hardware automation
- Hands-on experience with modern NVIDIA/AMD GPU platforms and high-performance networking (RoCE, InfiniBand)
- Deep knowledge of BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe systems
- Strong Linux systems experience including device drivers and package management
- Experience building infrastructure automation using Python and Bash
- Familiarity with GPU drivers, firmware ecosystems, and vendor collaboration
- Experience designing and delivering complex infrastructure products
- Proven ability to lead projects and mentor engineers
- Experience optimizing multi-cluster GPU environments
- Exposure to Machine Learning software stacks and GPU workloads
Compensation$145,000 - $160,000
This salary can vary based on location, years of experience, background and skill set.