Leads teams completing projects involving infrastructure which includes instructing, directing, and checking the work of other team members. Oversees the design of the work and partners with users to develop specifications. Conducts quality assurance review of the project and evaluates and documents team procedures. Oversees the installation, maintenance, configuration, and integrity of University infrastructure. Implements enhancements that will improve the infrastructure’s reliability and performance.
- Recommends purchases and capital equipment.
- Partners with new and existing clients to ensure that systems meet client needs.
- Advises client management on direction of systems, operations, staff allocation, budgets, and scalability.
Human Resources /Supervision
- Provides work direction and/or supervises staff such as team members, subordinates, contractors, vendors, students, etc.
- Recommends staff hires/terminations.
- Coaches and mentors staff.
- Manages projects ensuring timelines and deliverables are met and meet expectations.
- Organizes and performs technical training within or outside the team.
Design & Implementation
- Evaluation, planning and implementation of hardware, operating systems(Linux), networking utilities, analytical software tools, and research computing scheduler tools. Participation in the evaluation, planning, and implementation of cloud based research computing services (Primarily in AWS).
- Maintenance of HPC hardware (Compute, network, storage, backup).
- Investigates operational issues and resolves root cause of hardware, software, and process problems
- Facilitate HPC and research compute “scheduler” activities.
- User account and resource creation and management.
- Assist Research Computing Consulting team in selecting appropriate computing platforms per workload type.
- Develop short and long-term strategies for existing and new technologies and services.
- Fosters relationships with other NUIT groups, peer institutions, national research networks, service providers, and vendors.
- Craft and maintain policies and procedures for HPC Infrastructure administration.
- Leads planning and implementation efforts.
- Consults with Service Operations, Research Computing, and the research community to diagnose and resolve problems in a timely fashion while stressing user service.
- Analyzes and isolates root cause.
- Provides solutions to user questions and other queries such as connectivity problems, performance issues, functionality, outages etc.
- Implements new tools and procedures to monitor and measure system performance, build and use forecast models, and document and track system and operational metrics.
- Positive collaborative nature when working with others on our team, colleagues from schools, and vendors.
- Participate in 24×7 On-Call Rotation Schedule.
- Bachelor’s Degree or appropriate combination of education and experience.
- 6 or more years of relevant experience.
- Amazon Web Services (AWS), GlobusOnline, Linux Operating System, OpenStack/Cloudstack, Server hardware, Storage hardware
- Action oriented - Willing or likely to take practical action to deal with a problem or situation.
- Adaptability - Works effectively in a changing environment; adjusts behavior to meet the needs of different people and situations.
- Collaboration - Facilitates open and effective communication, cooperation and teamwork within and outside of one's own team.
- Communication - Communicates strategically to achieve specific objectives using varied vehicles and opportunities.
- Conflict management - Exhibits understanding of natural sources of conflict and acts to prevent or soften them.
- Critical and analytical thinking - Uses logic and reasoning to identify the strengths and weaknesses of alternative solutions, conclusions or approaches to problems.
- Customer & personal service - Assesses customer needs; meets quality standards for services; evalutes customer satisfaction.
- Dependability - Completes duties and responsibilities appropripeltly and on time.
- Emergency response - Assesses emergency situations; follows SOPs; exercises calm, quick judgment in stressful conditions.
- Ethics and integrity - Incorporates honesty, respect and fairness in daily actions.
- Initiative Exhibits energy and desire to achieve; sets ambitious goals and acts decisively; takes action that no one has requested to improve or enhance job results and avoid problems.
- Judgment; decision making; discretion - Considers the relative costs and benefits of potential actions to choose the most appropriate one.
- Meets deadlines - Displays consistency and success in adhering to deadlines.
- Multi-tasking - Demonstrates ability to work on multiple projects simultaneously.
- Negotiations - Finds common ground to accommodate the conflicting needs and wants of different stakeholders.
- Planning - Devises and implements clearly defined strategies to achieve business objectives.
- Problem solving - Formulates realistic plans and contingencies and establishes appropriate measurements of anticipated results.
- Results driven - Sets clear, challenging objectives and regularly monitors progress.
- Strategic thinking - Works on initiatives that have the greatest strategic impact for the organization; anticipates changes that may impact department or school.
- Stress management - Recognizes stress triggers and works to mitigate them, maintains work/life balance, responds appropriately to stressors outside of one's control.
- Team player - Seeks to build collaboration by encouraging trust, mutual respect and shared purpose among various participants in an engagement.
- Master’s Degree or appropriate combination of education and experience.
- Advanced Linux Operating System Skills(security, networking and file systems).
- Scheduling software