Staff AI/ML Infrastructure Engineer

Vultr

• $145K — $160K *

US-AnywhereRemote in United States

Information Technology

5 - 7 years of experience

2 weeks ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years with bare metal infrastructure and hardware automation
Hands-on experience with modern NVIDIA/AMD GPU platforms
Deep knowledge of BIOS, BMC, firmware, NICs, and PCIe systems
Strong Linux systems experience including device drivers
Experience building automation using Python and Bash
Familiarity with GPU drivers and vendor collaboration
Experience designing complex infrastructure products
Proven project leadership and mentoring capabilities
Experience optimizing multi-cluster GPU environments
Exposure to Machine Learning stacks and GPU workloads

Responsibilities

Design and maintain GPU and bare metal infrastructure
Build scalable GPU clusters with networking teams
Ensure reliable provisioning of GPU infrastructure
Develop automated testing systems for GPU platforms
Implement solutions for diverse AI/ML workloads
Benchmark and troubleshoot GPU performance
Collaborate with hardware vendors on drivers and support
Optimize performance across architectures
Lead technical direction and mentor engineers

Benefits

Opportunity to build next-gen AI infrastructure
Be part of a high-growth technology company
Hands-on technical leadership role
Collaboration with diverse teams and vendors
Focus on operational excellence and innovation

Full Job Description

Join Vultr

Vultr is seeking a highly skilled and experienced Staff AI/ML Infrastructure Engineer to drive the design, performance, and reliability of our AI infrastructure platform. The ideal candidate is a hands-on infrastructure expert with deep GPU systems knowledge, strong automation experience, and a track record of technical leadership in high-performance environments. This is a highly visible role in a high-growth technology company, requiring ownership of complex hardware and software systems, collaboration across engineering and vendor partners, and a relentless focus on operational excellence. This is your opportunity to build the foundation powering next-generation AI workloads and leave a lasting mark on Vultr and the future of cloud infrastructure.

Key Responsibilities

Design and maintain GPU and bare metal infrastructure in containerized and physical environments
Build scalable GPU clusters in partnership with networking and provisioning teams
Ensure reliable, high-performance provisioning of GPU infrastructure
Develop automated testing systems for GPU-based platforms
Implement infrastructure solutions for diverse AI/ML workloads
Benchmark, test, and troubleshoot GPU performance at scale
Collaborate with hardware vendors on drivers, firmware, and support
Resolve hardware, software, and performance issues across environments
Optimize rail and cluster performance across architectures
Lead technical direction and mentor engineers on infrastructure best practices

Qualifications

5+ years experience working with bare metal infrastructure and hardware automation
Hands-on experience with modern NVIDIA/AMD GPU platforms and high-performance networking (RoCE, InfiniBand)
Deep knowledge of BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe systems
Strong Linux systems experience including device drivers and package management
Experience building infrastructure automation using Python and Bash
Familiarity with GPU drivers, firmware ecosystems, and vendor collaboration
Experience designing and delivering complex infrastructure products
Proven ability to lead projects and mentor engineers
Experience optimizing multi-cluster GPU environments
Exposure to Machine Learning software stacks and GPU workloads

Compensation

$145,000 - $160,000

This salary can vary based on location, years of experience, background and skill set.

* Ladders Estimates

Similar Jobs

Electrical Engineer, Personal Robotics Group
$134K — $185K *
Amazon
San Francisco, CA 94112 (San Francisco County)
Today
Cloud Hardware Development Engineer, AWS Hardware Engineering Services, Specialized Platforms and Servers
$157K — $212K *
Amazon
Cupertino, CA 95014 (Santa Clara County)
Today
Hardware Rework Specialist
$71K — $145K *
Hewlett Packard Enterprise Development LP
Sunnyvale, CA 94087 (Santa Clara County)
Reposted Today
HW Dev Engineer - Payload, Amazon Leo Hardware Development
$117K — $160K *
Amazon
Redmond, WA 98052 (King County)
Reposted Today
Sr. System Engineer
$137K — $156K *
Super Micro Computer, Inc
San Jose, CA 95123 (Santa Clara County)
Reposted Today
System Development Eng, PCIe, AWS PCIe and Signal Integrity Team
$129K — $174K *
Amazon
Seattle, WA 98115 (King County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Vultr

Director, Talent Acquisition
$150K — $175K *
Remote
3 days ago
Staffing
Remote in United States
Senior Site Reliability Engineer, Infrastructure
$125K — $135K *
Remote
4 days ago
Information Technology
Remote in United States
Senior Technical Product Manager, Networking
$135K — $175K *
Remote
4 days ago
Information Technology
Remote in United States
Technical Project Manager - Infrastructure and Capacity
$75K — $95K *
Remote
6 days ago
Information Technology
Remote in United States
Senior Financial Analyst
$80K — $100K *
Remote
1 week ago
Finance & Insurance
Remote in United States

More Information Technology Jobs

Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
6 days ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Software Engineer II, Search & Data Infrastructure -Slack
$117K — $223K *
Salesforce
Washington, DC 20011 (District Of Columbia County)
Reposted Today
Software Engineer Lead
$55K — $158K *
The PNC Financial Services Group, Inc
Dallas, TX 75217 (Dallas County)
Reposted Today
Senior R&D Engineer-17637
$130K — $180K *
Synopsys Inc
Sunnyvale, CA 94087 (Santa Clara County)
Today

Find similar Staff AI/ML Infrastructure Engineer jobs:

Nationwide Remote

Staff AI/ML Infrastructure Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff AI/ML Infrastructure Engineer jobs:

Get Ready For Your
Next Interview