Fleet Operations Manager, Data Center Infrastructure

Meta

$120K — $150K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • BS, BA, or BEng in a technical field or equivalent experience
  • 5+ years managing technical teams with performance management responsibilities
  • 10+ years of engineering or operational experience, preferably in mature environments
  • Strong understanding of data center infrastructure, including power, cooling, and network systems
  • Proficient in using data and metrics for decision-making and problem-solving
  • Ability to communicate effectively with diverse audiences and work cross-functionally
  • Willingness to travel up to 30% of the time

Responsibilities

  • Build and lead a high-performing data center operations team across multiple locations
  • Manage maintenance and operations of server hardware and supporting infrastructure at scale
  • Become a technical expert on Meta's infrastructure to drive site and fleet operations
  • Drive continuous improvement in engineering and operational performance
  • Use data analytics to identify inefficiencies and enhance problem-solving capabilities
  • Collaborate with cross-functional teams to maintain fleet health and ensure operational efficiency
  • Evolve processes to scale globally while fostering a culture of accountability and innovation

Benefits

  • Comprehensive health insurance options
  • Retirement savings plan with company match
  • Flexible work arrangements
  • Opportunities for professional development and training
  • Culture of innovation and collaboration
Full Job Description
The Fleet Operations Manager is accountable for managing and leading a geographically dispersed team, delivering SLA/KPI's related to production server hardware, resolution of systemic technical issues, and repairs throughout the assigned geographic region of data centers. We are looking for someone who can effectively prioritize and adapt to shifting priorities in a dynamic operational environment. The ideal candidate is an IT professional with strong leadership skills and experience in Server Hardware, Project Management, Quality Management, Data Analytics, Networks, OS repair, Linux and Automation, ideally in a datacenter environment. Having an extensive understanding of managing servers in a large-scale distributed environment

Responsibilities

Build and lead a geographically dispersed, high-performing data center operations team, developing both the technical capabilities and leadership qualities of engineers
• Establish and manage a Data Center Operations Team accountable for the maintenance and operation of server hardware and supporting infrastructure at scale
• Become a technical expert in Meta's infrastructure, including platforms, tools, systems, architecture, workflows, and performance
• Provide strategic direction, guidance, and support for site and fleet-level operations
• Analyze and drive continuous improvement in the engineering and operational performance of our data centers
• Employ data analytics to identify inefficiencies, opportunities, exceptions, and correlations in a complex, highly interconnected, technical environment. Enable rapid and effective problem solving, along with proactive identification and mitigation of risks and issues
• Collaborate with cross-functional partner teams to ensure fleet health and maintain targeted capacity levels, resulting in optimized operations, minimized downtime, and seamless scalability
• Evolve and optimize processes in a globally consistent way to allow Meta to scale and grow effectively
• Support and mentor engineers in their day-to-day work, as well as in finding opportunities to develop and grow based on their areas of strength and interest
• Create and drive a culture of ownership, innovation, collaboration, accountability, continuous improvement, and safety
• Conduct performance management for a technical engineering team, providing clear expectations and goals
• Assume the role of incident manager during large-scale, site-wide, and region-wide production-impacting events, as the primary point of contact for your site. This requires working cross-functionally to scope problems, mitigate risks, affect fixes, and communicate the nature, status, and resolution plan for incidents
• Support and contribute thought leadership to the development and implementation of business practices, processes and automated tooling
• Develop deep knowledge and ownership of a hyper-scale computing fleet through the use of data analysis to identify trends and systemic issues and opportunities; reporting out globally and sharing with peers as appropriate

Minimum Qualifications
• BS, BA, or BEng in a technical field or commensurate experience
• Ability to travel up to 30% is required
• Experience participating in or leading technical projects related to areas such as process improvement, technology, and/or automation, including bringing in additional expertise as needed
• 5+ years of experience managing teams of technical resources, including people and performance management responsibilities
• Understanding of data center infrastructure and/or operations, including power, cooling, and/or network systems; structured cabling; and management of projects, incidents, and vendors
• Experience using data and metrics to drive decision-making
• Ability to influence effectively, working on cross-functional teams to advance the needs of the company and adapting teams to meet these needs
• 10+ years of engineering or operations experience, preferably in a mature engineering or operations environment, working with cross-functional teams
• Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience

Preferred Qualifications
• Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)
• Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
• Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies
• Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
• Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)
• Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies
• Six Sigma knowledge/certification
• Experience leading technical resources using Linux or an equivalent OS to support hardware systems in a complex IT environment
• Experience with large-scale AI implementations and the use of AI to drive automation
• Experience in large-scale data center hardware deployments and building scalable infrastructure
• Knowledge of the interdependencies of data center functions and technologies, including electrical, cooling, structured cabling, security, and network

Similar Jobs

More Jobs at Meta

More Information Technology Jobs

Find similar Fleet Operations Manager, Data Center Infrastructure jobs: