Florida State University

HPC Operations Manager

Florida State University$85K — $110K *
Technical Services
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in IT, Computer Science, or related field, or equivalent experience
  • Four years of experience in a relevant field, or eight years with a high school diploma
  • Proven expertise in the Linux operating system
  • Experience managing High Performance Computing (HPC) systems
  • Familiarity with programming, particularly in bash or Python
  • Prior management experience

Responsibilities

  • Lead and manage RCC operations and team
  • Supervise staff and student workers in systems administration
  • Collaborate with faculty on systems planning and technical solutions
  • Conduct preventative maintenance and vendor coordination for server-room infrastructure
  • Oversee incident handling and critical alarm response
  • Assist in developing research facilities and capabilities for faculty and students
  • Provide continuous 24/7 support for data center operations

Benefits

  • Possibility for professional development through training and consulting
  • Engagement with cutting-edge research technologies and methodologies
  • Opportunity to make a significant impact on research computing
  • Collaborative work environment with faculty and staff
  • Access to state-of-the-art computing facilities
Full Job Description
Department

This position is within FSU's Department of Information Technology Services (ITS)

The Research Computing Center at Florida State University enables research and education by maintaining campus cyber infrastructure and by providing training opportunities and dedicated consulting for all faculty, staff, and students.

Responsibilities

Leads and manages RCC operations, overseeing systems administration, staff supervision, server-room management, and technical support for diverse research computing needs. The Operations Manager will oversee the management and operation of all computing facilities within the Research Computing Center (RCC).

Responsibilities include supervising and mentoring staff and student workers in systems administration, collaborating with faculty on systems planning, and implementing technical solutions such as software installation, source code modification, scripting, and database development to meet evolving research needs. Stay abreast with developments in AI and agentic agents, including deploying LLM models on local or cloud resources.

Lead efforts related to preventative maintenance, vendor coordination, and system monitoring for critical server-room infrastructure (cooling, power, fire suppression) in the Sliger building. Oversee critical alarm response and incident handling related to server-room systems. Coordinate colocation operations, including rack placement strategy and power provisioning. Assist the director with the development of new or improved research facilities for faculty; develop new or improved capabilities for students; and develop and implement policies. Provides around-the-clock 24/7 as-needed support for the Sliger data center, Works with other ITS and facility specialists as necessary to ensure timely off-hours problem response and reports.

Performs duties in compliance with ITS policies, guidelines, and processes pertaining to support requests, work orders, project management, change management, and incident management. Appropriately utilizes associated tools in accordance with ITS standards. Participate in the ITS Change Management process in the role as Change Manager for RCC. Assist with project management regarding the computer facilities within RCC.

Qualifications

Bachelor's degree in Information Technology, Computer Science, MIS, or other appropriate degree and four years' experience or a high school diploma or equivalent and eight years of experience. (Note: or a combination of appropriate post high school education and experience equal to eight years.)

Preferred Qualifications

  • Excellent knowledge of the Linux operating system,
  • Experience in managing HPC systems,
  • Programming experience, for example, using bash scripts or python applications,
  • Management experience.

Helpful

Who is the ideal candidate for this position?
  • Curious and eager to learn
  • Not afraid of tackling complex problems
  • Good communicator
What is a typical day in this position?

This position will manage day-to-day operations, work on long-term projects and collaborate with the RCC application team: a typical day is a mixture of all of those tasks.

What can I expect in the first 60-90 days?

You will be drinking from a firehose. The RCC manages a complex system of servers, network infrastructure, and storage systems that serve 1000+ users.

How To Apply

If qualified and interested in a specific job opening as advertised, apply to Florida State University at https://jobs.fsu.edu. If you are a current FSU employee, apply via myFSU > Self Service.

Applicants are required to complete the online application with all applicable information. Applications must include all work history up to ten years, and education details even if attaching a resume.

Considerations

This is an A&P position.

This position requires successful completion of a criminal history background check .

Similar Jobs

More Jobs at Florida State University

More Technical Services Jobs

Find similar HPC Operations Manager jobs: