Infrastructure Engineer

Overland AI

$130K — $225K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years in DevOps, SRE, infrastructure engineering, or systems engineering
  • Experience with AWS orchestration and deployments at scale
  • Proficiency with infrastructure-as-code tooling (Terraform, Ansible, Puppet, Chef, etc.)
  • Experience with Kubernetes and GitOps patterns
  • Experience with observability and monitoring stacks
  • Strong Linux administration skills
  • Deep understanding of networking including firewalls and routing

Responsibilities

  • Build, operate, and evolve on-premise and cloud infrastructure for AI/ML and robotics
  • Develop CI/CD pipelines using GitLab or GitHub Actions
  • Deploy and manage AWS environments including IAM, EC2, VPCs, and S3
  • Implement and maintain infrastructure-as-code practices
  • Support Kubernetes clusters and GitOps workflows
  • Manage observability stacks to ensure system performance
  • Document systems, processes, and runbooks

Benefits

  • Equity compensation
  • Best-in-class healthcare, dental, and vision plans
  • Unlimited PTO
  • 401(k) with company match
  • Parental leave
Full Job Description
Role Summary:

Overland AI is looking for an experienced Infrastructure Engineer to help design, build, and operate the systems that power our AI model training, experiment management, and robotic deployments. This role spans on-premise environments, cloud infrastructure, networking, and automation. You'll work hands-on with servers, storage, firewalls, wireless equipment, and high-performance compute resources-while also developing scalable tooling that improves reliability, observability, and developer velocity.

The ideal candidate has 5+ years of experience in infrastructure engineering, DevOps, SRE, or systems engineering, with deep knowledge of on-prem environments, AWS deployments at scale, and modern infrastructure-as-code and automation practices.

What You'll Do:

  • Build, operate, and evolve on-premise and cloud infrastructure supporting AI/ML development and robotics programs
  • Develop CI/CD pipelines using GitLab or GitHub Actions
  • Deploy and manage AWS environments including IAM, EC2, VPCs, and S3
  • Implement and maintain infrastructure-as-code (Terraform, Ansible, Puppet, Chef, etc.)
  • Install, configure, and troubleshoot physical servers, networking equipment, and storage systems
  • Support Kubernetes clusters (clusteradm, Kops, EKS) and GitOps workflows (ArgoCD, Flux, Spinnaker)
  • Build custom automation and internal infrastructure tooling
  • Manage observability stacks (Prometheus/Grafana, ELK, Datadog, etc.)
  • Partner closely with engineering teams to ensure reliability, security, and efficient scaling
  • Document systems, processes, and runbooks to support local and remote teams

Minimum Qualifications

  • 5+ years in DevOps, SRE, infrastructure engineering, or systems engineering
  • Experience with AWS orchestration and deployments at scale
  • CI/CD experience with GitLab, GitHub Actions, or similar platforms
  • Proficiency with infrastructure-as-code tooling (Terraform, Ansible, Puppet, Chef, etc.)
  • Experience with Kubernetes and GitOps patterns
  • Experience with observability and monitoring stacks
  • Experience with on-prem hardware environments (VMWare, Proxmox, or equivalent)
  • Hands-on experience building and troubleshooting physical servers and networks
  • Strong Linux administration skills
  • Deep understanding of networking: firewalls, L3 switching, routing, VPNs, WAN/wireless systems
  • Ability to program in Python, Go, Rust, or a similar language (in addition to shell)
  • Excellent documentation, communication, and collaboration skills

Desired Experience and Qualifications

  • Familiarity with experiment tracking, ML infrastructure, or data visualization tooling
  • Experience integrating hardware or embedded systems
  • Experience deploying or supporting wireless/WAN infrastructure in field, test, or event environments
  • Familiarity with ML/AI infrastructure, high-performance compute clusters, or robotics-focused environments

Other Requirements

  • Ability to travel in-state, including occasional long days during deployments or testing
  • Ability to travel out-of-state for ~1-2 weeks per year
  • Ability to work onsite in our Seattle office at least 3 days per week
  • Ability to participate in 24x7 on-call rotation
  • Ability to obtain and maintain a DoD Security Clearance

Benefits

  • Competitive salary: $130K - $225K annually
  • Equity compensation
  • Best-in-class healthcare, dental, and vision plans
  • Unlimited PTO
  • 401(k) with company match
  • Parental leave


Similar Jobs

More Jobs at Overland AI

More Information Technology Jobs

Find similar Infrastructure Engineer jobs: