Senior Director, Data Center Facilities

DigitalOcean

$233K — $292K *
Technical Services
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 5-10 years leading data center operations in cloud or high-growth environments
  • Proven track record of enhancing operational performance across multiple locations
  • Expertise in building scalable operational processes and governance routines
  • Strong grasp of data center infrastructure, including power, cooling, and maintenance
  • Familiarity with high-density compute environments and advanced cooling systems

Responsibilities

  • Lead and improve on-site data center operations globally
  • Develop scalable operating models and standards for staffing and incident management
  • Drive operational excellence through process standardization and performance monitoring
  • Facilitate the rapid launch of new data center locations with established protocols
  • Manage vendor relationships to ensure service quality and accountability

Benefits

  • Opportunity to lead and shape global data center operations
  • Engagement with cutting-edge AI/GPU technologies
  • Work in a dynamic, high-growth company environment
  • Career development and leadership training programs
  • Cross-functional collaboration with diverse teams
Full Job Description
The Senior Director of Data Center On-Site Operations is a senior operational leader responsible for driving consistent, scalable, and high-performing execution across DigitalOcean's global data center footprint. This role leads the on-site operations strategy for existing regional data centers and supports rapid expansion into new locations by building the operating model, leadership structure, processes, standards, and performance systems required to scale effectively. This leader is accountable for improving operational excellence across data center sites, including execution quality, service reliability, staffing readiness, vendor performance, incident response, maintenance discipline, change execution, and customer-impact prevention. The role is focused on strengthening day-to-day operational performance while creating repeatable processes that allow the organization to grow quickly across both existing and new markets. As DigitalOcean continues to invest heavily in physical infrastructure and AI/GPU capacity, this leader will play a critical role in preparing on-site operations for higher-density environments, liquid-cooled infrastructure, accelerated deployments, and increasingly demanding customer SLAs. The Senior Director will ensure that on-site teams can support rapid growth while maintaining safety, reliability, consistency, and accountability. This executive serves as a key escalation leader for site-level operational issues, vendor performance concerns, and execution risks. The role requires strong cross-functional partnership with Infrastructure, Engineering, Design, Construction, Capacity Planning, Procurement, Networking, Product, and executive leadership to ensure data center operations are aligned with business growth, customer commitments, and long-term scalability. **Key Strategic Priorities** - **Operational Excellence and Continuous Improvement:** Build and mature a disciplined operating model across all data center locations, with clear standards, repeatable processes, strong metrics, and continuous improvement mechanisms. - **Scalable Site Operations Model:** Develop the staffing models, leadership structures, training programs, procedures, and governance required to support rapid growth in both the number of locations and the complexity of operations. - **Rapid Data Center Expansion Readiness:** Ensure new sites can be operationalized quickly and consistently through standardized launch playbooks, readiness checklists, vendor coordination, staffing plans, and operational acceptance criteria. - **Performance, Reliability, and SLA Improvement:** Improve site-level performance against availability, service delivery, maintenance, change management, incident response, inventory accuracy, deployment execution, and customer-impact prevention. - **AI/GPU Operational Enablement:** Prepare on-site operations to support high-density GPU environments, liquid cooling, complex cabling, accelerated hardware deployments, higher-touch customer requirements, and increased operational risk. **Essential Duties and Responsibilities** **Operational Leadership** - Provides senior leadership for data center on-site operations across DigitalOcean's existing and expanding global footprint. - Leads the development and execution of a scalable operating model for data center sites, including standards for staffing, shift coverage, escalation, maintenance, change management, incident response, vendor oversight, inventory control, deployment support, and customer-impact prevention. - Drives operational excellence across all data center locations by identifying gaps, standardizing best practices, improving execution discipline, and ensuring consistent adoption of policies, procedures, and operating standards. - Leads site operations teams responsible for 24x7 data center support, ensuring availability, reliability, safety, quality, and service delivery meet or exceed business and customer expectations. - Improves on-site execution performance by strengthening accountability, defining clear ownership, and implementing measurable performance standards across locations. **Scalable Growth and Process Development** - Build repeatable operational processes that allow DigitalOcean to scale from a regional data center footprint to a larger, more complex, and more globally consistent operating environment. - Develops standardized site launch playbooks, operational readiness checklists, escalation models, staffing templates, maintenance routines, training plans, and vendor management practices for new and existing sites. - Partners with cross-functional teams to ensure that site operations are prepared for data center expansions, new technology deployments, new customer requirements, higher-density racks, GPU infrastructure, and liquid cooling environments. - Identifies operational constraints that limit growth and develops practical solutions to improve throughput, reduce execution friction, and accelerate site readiness. - Creates a process-development roadmap that supports organizational growth, improves consistency, and reduces dependency on individual tribal knowledge. **Performance Management and Metrics** - Defines and manages key operational KPIs, SLAs, and performance indicators across the data center operations organization. - Uses data to identify trends, performance gaps, recurring issues, staffing constraints, vendor deficiencies, and opportunities for automation or process improvement. - Develops operational dashboards and review mechanisms for availability, incident response, maintenance completion, change success, inventory accuracy, deployment throughput, backlog management, staffing readiness, vendor performance, and customer-impact events. - Leads operational reviews with senior leadership, providing clear visibility into site health, execution risks, improvement plans, and progress against strategic goals. - Benchmarks operational performance against industry standards, internal targets, and business requirements. **Site Reliability, Availability, and Risk Management** - Has responsibility for meeting or exceeding established service levels for data center operations. - Oversees site-level execution supporting critical physical infrastructure, including power, cooling, cabling, space, racks, network infrastructure, server deployment, hardware maintenance, and customer environment support. - Ensures that work performed in data center environments is completed safely, correctly, and without impact to internal or external customers. - Drives stronger change management discipline for work performed in live production environments, including pre-work planning, risk assessment, approvals, execution quality, rollback readiness, and post-change validation. - Evaluates and mitigates operational risks across new and existing sites, including staffing gaps, vendor performance issues, maintenance deficiencies, capacity constraints, process weaknesses, and readiness gaps for new deployments. - Improves incident response, root cause analysis, corrective action tracking, and recurrence prevention across data center locations. **Data Center Expansion and Readiness** - Supports data center expansion efforts by ensuring on-site operational requirements are identified early and integrated into planning, design, construction, commissioning, and turnover processes. - Partners with Design, Construction, Capacity Planning, Engineering, Procurement, Networking, and Product teams to translate business demand into operational requirements for space, power, cooling, racks, cabling, staffing, maintenance, security, logistics, and site support. - Develops operational acceptance criteria for new sites, new data halls, new pods, and major infrastructure deployments. - Ensures new sites are handed over with complete documentation, diagrams, procedures, training, spares, escalation paths, vendor contacts, and support models. - Drives readiness for accelerated GPU deployments, high-density rack environments, liquid cooling, and other infrastructure programs tied to business growth. **Vendor and Partner Management** - Leads site-level vendor management to ensure service providers, contractors, colocation partners, and maintenance vendors meet operational expectations, contractual commitments, SLAs, and safety requirements. - Serves as a senior escalation point for vendor performance issues, site delivery concerns, maintenance execution problems, and operational deficiencies. - Develops consistent vendor governance practices, including performance reviews, issue tracking, corrective action plans, escalation paths, and service-quality metrics. - Partners with Procurement, Legal, and Finance to identify opportunities to streamline vendor services, optimize costs, and improve accountability. - Builds strong working relationships with vendor stakeholders while maintaining clear expectations for execution, quality, safety, and delivery. **Team Leadership and Organizational Development** - Leads, coaches, and develops data center operations managers, site leaders, and technical operations teams. - Builds a leadership bench capable of supporting rapid site growth, increased operational complexity, and higher customer expectations. - Creates training, mentoring, and skill-development programs that improve technical capability, execution discipline, safety awareness, and leadership readiness. - Develops staffing models that support service-level requirements, site complexity, customer commitments, GPU concentration, and operational risk. - Fosters a culture of ownership, accountability, collaboration, continuous improvement, safety, customer focus, and operational discipline. - Removes organizational barriers that slow execution, reduce accountability, or prevent strong cross-functional collaboration. **Governance, Documentation, and Standards** - Defines and maintains policies, procedures, standards, and documentation related to data center operations, including performance, capacity, availability, continuity, security, safety, maintenance, and change management. - Ensures documentation and diagrams accurately capture critical site information, including physical layout, rack elevations, power paths, cooling configurations, cabling standards, escalation paths, maintenance routines, and operational dependencies. - Maintains operational governance routines to ensure standards are being followed across sites and that deviations are identified, reviewed, and corrected. - Supports budgeting, forecasting, planning, deployment coordination, incident management, problem management, change management, and operational reporting. - Creates and delivers executive-level presentations on site operations performance, operational risks, improvement programs, expansion readiness, and organizational needs. - Performs other duties as assigned. **Knowledge, Skills, and Abilities** - Strong experience leading data center operations in a large enterprise, cloud, colocation, or high-growth infrastructure environment. - Demonstrated ability to improve operational performance across multiple data center locations. - Experience building scalable operational processes, playbooks, staffing models, governance routines, and performance systems. - Strong understanding of data center on-site operations, including power, cooling, cabling, hardware support, maintenance, vendor management, change management, incident response, logistics, and customer-impact prevention. - Experience operating against SLAs and improving performance in environments where availability, reliability, and execution quality are mission critical. - Ability to lead in a highly complex, fast-moving, matrixed organization with multiple stakeholders and competing priorities. - Strong understanding of data center growth challenges, including new site launches, capacity expansion, staffing readiness, vendor coordination, process standardization, and operational handoff. - Experience supporting high-density compute environments, GPU infrastructure, liquid cooling, large-scale server deployments, and complex cabling environments is strongly preferred. - Strong knowledge of data center critical infrastructure, including UPS systems, generators, transfer switches, power distribution, DC power systems, HVAC, cooling systems, liquid cooling, racks, structured cabling, and physical security. - Ability to translate business, server, storage, networking, and customer requirements into practical data center operational needs. - Strong vendor management and negotiation skills, with the ability to hold partners accountable while maintaining effective business relationships. - Excellent communication skills, including the ability to present complex operational issues clearly to executive, technical, and non-technical audiences. - Proven ability to develop credibility, influence without authority, and drive change across diverse teams. - Strong decision-making, problem-solving, prioritization, and risk-management capabilities. - Ability to develop and communicate relevant department metrics, SLA performance reports, operating reviews, and executive-level summaries. - Demonstrated ability to develop teams, coach leaders, mentor technical staff, and build organizational capability. - Familiarity with industry standards, certifications, and operational best practices for data center environments. - Working knowledge of IT infrastructure, including servers, storage, networking, operating systems, virtualization, and cloud infrastructu

Similar Jobs

More Jobs at DigitalOcean

More Technical Services Jobs

Find similar Senior Director, Data Center Facilities jobs: