Manager, Data Center Operations

Crusoe

$135K — $175K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of data center operations leadership experience in a production environment.
  • Experience managing and developing technical teams.
  • Hands-on experience troubleshooting enterprise server hardware, including GPU nodes.
  • Strong familiarity with SuperMicro hardware and RMA processes.
  • Working knowledge of data center electrical and mechanical systems.

Responsibilities

  • Own the daily operation and health of the OH5C data center.
  • Lead troubleshooting and repair of GPU compute hardware.
  • Drive rapid triage and repair to meet uptime targets.
  • Coordinate hardware support with OEM vendors, primarily SuperMicro.
  • Track and report site KPIs and improve reliability.

Benefits

  • Competitive compensation and equity
  • Paid time off, holidays, and leave programs
  • Medical, dental, and vision insurance
  • Employer HSA contributions
  • Paid parental leave
  • Professional development and tuition reimbursement
  • Mental health and wellness support
  • 401(k) with company match up to 4%
  • Daily meal allowance
  • Additional location-specific benefits
Full Job Description
About the Role

Crusoe is seeking a Manager of Data Center Operations to lead our OH5C site in Springfield, Ohio.

This is a hands-on leadership role overseeing the day-to-day health of a high-density, GPU-heavy compute environment. You will lead the on-site technician team, drive hardware reliability and break-fix performance, manage colocation relationships, and ensure the site meets fleet-wide operational standards.

The ideal candidate is a technically strong, highly accountable leader who can move comfortably between the data center floor and senior-level operational reviews.

What You'll Be Working On

Site Operations
  • Own the daily operation, health, and availability of the OH5C data center.
  • Lead troubleshooting and repair of GPU compute hardware, including GPU trays, DIMMs, drives, cabling, and server nodes.
  • Drive rapid triage and repair while maintaining MTTR and uptime targets.
  • Coordinate RMAs and hardware support with OEM vendors, primarily SuperMicro.
  • Maintain spare-parts inventory and ensure critical hardware is available when needed.
  • Partner with Fleet Operations, SRE, networking, and infrastructure teams on escalations.
Team Leadership
  • Lead, coach, and develop the on-site data center technician team.
  • Set clear expectations for safety, quality, responsiveness, and accountability.
  • Conduct regular one-on-ones, performance reviews, and development planning.
  • Support technician hiring, onboarding, training, and workforce planning.
  • Build a culture of technical precision, ownership, and continuous improvement.
Performance and Reporting
  • Track and report site KPIs, including uptime, MTTR, SLA compliance, deployment velocity, and ticket aging.
  • Use operational data to identify recurring issues and improve reliability.
  • Maintain accurate break-fix workflows in Jira or a comparable ticketing system.
  • Provide clear operational updates, incident summaries, and corrective-action plans to senior leadership.
Colocation and Facilities
  • Serve as the primary on-site liaison with the colocation provider.
  • Hold facility partners accountable to SLAs related to power, cooling, security, and availability.
  • Maintain working knowledge of UPS systems, PDUs, generators, CRAC and CRAH systems, and supporting infrastructure.
  • Escalate and track facility issues through resolution.
  • Coordinate planned maintenance to minimize risk to production systems.
Process and Documentation
  • Maintain site runbooks, SOPs, emergency procedures, and hardware documentation.
  • Ensure work is completed in accordance with safety, security, and change-management standards.
  • Contribute to fleet-wide operating standards and knowledge sharing.
  • Maintain accurate asset, inventory, and configuration records.
What You'll Bring to the Team
  • 5+ years of data center operations leadership experience in a production environment.
  • Experience managing and developing technical teams.
  • Hands-on experience troubleshooting enterprise server hardware, including GPU nodes, DIMMs, drives, cabling, and rack-level infrastructure.
  • Strong familiarity with SuperMicro hardware, diagnostics, event logs, and RMA processes.
  • Experience working in colocation environments and managing provider SLAs.
  • Working knowledge of data center electrical and mechanical systems.
  • Experience with Jira, ServiceNow, or a similar ticketing platform.
  • Strong understanding of incident management, root-cause analysis, and operational risk.
  • Clear written and verbal communication skills, including the ability to present technical and operational information to senior leaders.
  • Ability to work on-site in Springfield, Ohio, and support critical incidents as needed.


Preferred Qualifications
  • Experience supporting AMD GPU clusters, including MI300X or equivalent platforms.
  • Familiarity with NVIDIA GPU platforms such as H100, H200, or B200.
  • Understanding of RoCE fabric topology and common failure modes.
  • Experience with DCIM or asset-management tools such as NetBox.
  • Multi-site or regional data center operations experience.
  • Experience in rapidly scaling cloud, hyperscale, or AI infrastructure environments.


Location and Travel

This role is based on-site at Crusoe's OH5C facility in Springfield, Ohio. Periodic travel to other Crusoe sites may be required for training, cross-site projects, or operational support.

Benefits
  • Competitive compensation and equity
  • Restricted Stock Units
  • Paid time off, holidays, and leave programs
  • Medical, dental, and vision insurance
  • Employer HSA contributions
  • Paid parental leave
  • Life, short-term disability, and long-term disability insurance
  • Professional development and tuition reimbursement
  • Mental health and wellness support
  • Commuter benefits
  • Cell phone stipend
  • 401(k) with company match up to 4%
  • Volunteer time off
  • Global travel insurance and emergency assistance
  • Daily meal allowance
  • Additional location-specific benefits


Compensation Range

Compensation will be paid within a range of $135,000-$175,000, plus bonus. Restricted Stock Units are included in all offers. Final compensation will be determined based on the applicant's knowledge, education, experience, and abilities, as well as internal equity and alignment with market data.

Similar Jobs

More Jobs at Crusoe

More Information Technology Jobs

Find similar Manager, Data Center Operations jobs: