Datacenter Field Engineer

Sciforium

$100K — $140K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 3+ years in Linux Systems Administration, with expertise in boot processes and disk management.
  • Strong hardware troubleshooting skills in high-density server environments.
  • Experience with networking security including VPNs and firewalls.
  • Familiarity with directory services like LDAP and Active Directory.
  • Proficiency in Bash scripting for automation.

Responsibilities

  • Act as the main contact for physical system outages and network interruptions to minimize downtime.
  • Monitor hardware health proactively, catching anomalies before they affect workloads.
  • Coordinate with data center staff and hardware vendors for repairs and maintenance.
  • Oversee the physical installation and integration of new GPU nodes into the cluster.
  • Install, patch, and maintain Linux operating systems on bare-metal servers.
  • Manage security configurations for networking and access controls.
  • Administer LDAP/Active Directory for user authentication and ensure reliable storage management.

Benefits

  • Medical, dental, and vision insurance
  • 401k plan
  • Daily lunch, snacks, and beverages
  • Flexible time off
  • Competitive salary and equity
Full Job Description
Role Overview

We are looking for a dedicated Hardware Operations & Systems Engineer to own the physical health and foundational infrastructure of our GPU clusters. You will be the primary custodian of our compute hardware, responsible for everything from data center vendor coordination up to the base Linux OS layer. You will ensure our research and product teams have a stable, secure, and fully operational physical environment to run their demanding compute workloads on.

Key Responsibilities
  • System Health & Hardware Reliability
    • On-Call Response: Serve as the primary point of contact for physical system outages, hardware failures, and network interruptions to minimize downtime.
    • Cluster Monitoring: Proactively monitor hardware health, including GPU thermals, power draw, and physical system loads, catching anomalies before they impact active workloads.
    • Vendor Liaison: Work closely with data center facility staff and third-party hardware vendors to coordinate RMA processes, physical repairs, part replacements, and routine maintenance.
    • Hardware Deployment: Rack, cable, and lead the physical bring-up of new GPU nodes, ensuring power and network connectivity are fully integrated into the existing cluster.
  • Linux & Network Administration
    • OS Management: Install, patch, and maintain Linux operating systems (Ubuntu/CentOS/RHEL) across the cluster bare-metal servers.
    • Security & Access: Configure and maintain edge and internal networking, including firewalls, VPNs, and strict SSH access controls to secure our infrastructure.
    • Identity & Storage Management: Administer LDAP/Active Directory for centralized user authentication and ensure network storage systems (NFS/GPFS/Lustre) are reliably mounted and properly permissioned.

Qualifications
  • Must-Haves:
    • 3+ years of experience in Linux Systems Administration (deep knowledge of boot processes, systemd, disk management, etc.).
    • Strong background in server hardware troubleshooting, specifically within high-density environments (power, cooling, PCIe topologies).
    • Experience managing networking security (VPNs, iptables/firewalld, VLANs) and directory services (LDAP/FreeIPA/Active Directory).
    • Proficiency in Bash scripting for essential system automation.
  • Nice-to-Haves:
    • Experience using configuration management tools like Ansible, SaltStack, or Terraform for OS provisioning.
    • Familiarity with data center operations, cooling requirements for high-TDP accelerators (like NVIDIA H100 or AMD MI300).


Benefits include
  • Medical, dental, and vision insurance
  • 401k plan
  • Daily lunch, snacks, and beverages
  • Flexible time off
  • Competitive salary and equity

Similar Jobs

More Jobs at Sciforium

More Information Technology Jobs

Find similar Datacenter Field Engineer jobs: