The Charles Stark Draper Laboratory

AI Systems Administrator

Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science or related field.
  • 3 years of Linux system administration experience in production environments.
  • Strong production Linux experience with RHEL/Oracle systems.
  • Knowledge of automation using Bash, Python, and Ansible.
  • Familiarity with security operations in regulated environments.

Responsibilities

  • Build, operate, and troubleshoot RHEL/Oracle systems for AI workloads.
  • Manage GPU driver/toolkit lifecycle and system performance.
  • Implement observability for system and GPU health monitoring.
  • Maintain high uptime and performance of LLM servers.
  • Collaborate with a team to enable software upgrades securely.
  • Automate and streamline platform administration using Git-based practices.
  • Lead projects on platform redesign and large-scale migrations.

Benefits

  • Hybrid work model (3 days/week) in Cambridge, MA.
  • Opportunities for workplace flexibility and social engagement.
  • Health and finance workshops available to employees.
  • Access to discounts at local museums and cultural activities.
Full Job Description

Job Description Summary:

The AI Systems Administrator is instrumental in bringing AI to Draper. The incumbent implements a closed GPT environment at Draper in which several different LLM models are maintained and used throughout the organization. This role works with engineering to ensure that multiple LLMs are accessible through a chat interface, API, and assistive tools for the general purpose of the organization. In addition, they will ensure the system health of the DraperGPT server to allow for additional AI infrastructure requiring large amounts of compute to be utilized without impacting the performance of other LLM resources. This will also include API interfaces with various software platforms across Draper (e.g., engineering, accounting, legal). This role helps Draper implement automation, streamline processes, and support mission-critical AI/ML workloads. Resource allocation is critical.

It also involves traditional Linux admin duties (installing, configuring, securing servers, scripting, monitoring) but with a strong focus on supporting AI/ML (e.g., GPU servers, Kubernetes, data pipelines), managing AI. This job supports AI engineers using their knowledge to guide AI engineers with solutions and recommendations. The role is part of a team of Linux system administrators responsible for managing the functionality and efficiency of a group of computers, approximately 750, running primarily Oracle Linux. Additional operating system knowledge, e.g. Ubuntu and RHEL, maybe be necessary. Maintain the integrity and security of servers and systems. Serves as a front-line interface to end users and other IS teams. The Systems Administrator makes recommendations for hardware and software purchases. Interacts with vendors and VARs directly on proactive projects as well as reacting to support issues. Duties may include installation, configure, and maintain new hardware/software, troubleshooting, permissions and training other administrators. Requires a solid understanding of UNIX based operating systems.

This role will by hybrid (3 days/week) in Cambridge, MA and will require an Active Secret Clearance.

Job Description:

Duties/Responsibilities

  • Build, operate, and troubleshoot RHEL/Oracle systems supporting GPU workloads (OS lifecycle, patching, performance, reliability).
  • Manage the GPU enablement layer: driver/toolkit lifecycle, kernel/driver compatibility, coordinated upgrades and rollback plans, and ongoing health monitoring.
  • Implement and maintain observability (metrics, logs, alerting) for system, GPU, and storage performance/health (e.g., Prometheus/Grafana and GPU telemetry such as DCGM/NVML or equivalent). 
  • Couple above observability with LLM performance and usage, and identify and warn users over allocating resources.
  • Maintaining (ie resetting or rebuilding) LLM servers to ensure high up times and usage capabilities across organization.
  • Working with a team of engineers to allow for software upgrades (e.g. new models, or additional AI software) to the server while maintaining security needs.
  • Partner with storage/network peers to baseline throughput/latency, identify bottlenecks, and tune the platform for predictable performance.
  • Automation & scripting: create and maintain automation for platform administration and broader Linux team workflows (provisioning/config enforcement, patch orchestration, reporting, routine maintenance), using Git-based practices. (Python/Ansible)
  • Work to support various Linux, Cloud AWS/Azure projects
  • Lead projects including large scale migrations as well as platform redesign and implementation. Utilize resources within the Linux team as well as across the IS department to reach goals

Skills/Abilities

  • Strong production Linux administration experience (RHEL/Oracle preferred): systemd, networking, troubleshooting, performance analysis, patching, package management.
  • Strong automation skills: Bash and/or Python, plus Ansible (preferred) or equivalent configuration management; comfortable with CI/Git workflows.
  • Experience supporting enterprise platforms (incident response, root-cause analysis, postmortems, runbooks/documentation).
  • Security-minded operations in regulated environments; familiarity with CUI handling concepts and control expectations (audit logging, vulnerability remediation, change control). 

Education

  • Bachelor's degree in Computer Science or a related field.

Experience

  • 3 years’ experience in Linux system administration, supporting production systems and core utility services in a complex enterprise environment.

Additional Job Description:

Applicants selected for this position will be required to obtain and maintain a government security clearance.

Active Secret Clearance required.

Job Location - City:

Cambridge

Job Location - State:

Massachusetts

Job Location - Postal Code:

The US base salary range for this full-time position is

$82,300.00 - $220,000.00

Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Union ranges will be in compliance with the collective bargaining agreement's approved rates by location and role. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.  Please note that the compensation details listed in US role postings reflect the base salary only, and does not include bonuses or benefits.

Our work is very important to us, but so is our life outside of work. Draper supports many programs to improve work-life balance including workplace flexibility, employee clubs ranging from photography to yoga, health and finance workshops, off site social events and discounts to local museums and cultural activities. If this specific job opportunity and the chance to work at a nationally renowned R&D innovation company appeals to you, apply now www.draper.com/careers.

About The Charles Stark Draper Laboratory

Draper Laboratory is an American non-profit research and development organization, headquartered in Cambridge, Massachusetts; its official name is The Charles Stark Draper Laboratory, Inc. The laboratory specializes in the design, development, and deployment of advanced technology solutions to problems in national security, space exploration, health care and energy. The laboratory was founded in 1932 by Charles Stark Draper at the Massachusetts Institute of Technology to develop aeronautical instrumentation, and came to be called the MIT Instrumentation Laboratory. During this period the laboratory is best known for developing the Apollo Guidance Computer, the first silicon integrated circuit based computer. It was renamed for its founder in 1970, and separated from MIT in 1973 to become an independent, non-profit organization. The expertise of the laboratory staff includes the areas of guidance, navigation, and control technologies and systems; fault-tolerant computing; advanced algorithms and software systems; modeling and simulation; and microelectromechanical systems and multichip module technology.
Learn more about The Charles Stark Draper Laboratory

Similar Jobs

More Jobs at The Charles Stark Draper Laboratory

More Information Technology Jobs

Find similar AI Systems Administrator jobs: