Job Description We have an opening for a
Senior Storage System Software Developeron a team that researches, designs, develops, maintains, and integrates software and hardware solutions that underpin scalable storage services within the Livermore Computing high-performance computing center. In this role, you will apply software development experience and broad systems-level mastery to support production parallel file systems and archival storage systems: to include troubleshooting, debugging, assistance to system administration staff to isolate software defects on production systems, and independent software development to address challenging issues on large-scale systems. Additionally, this position includes opportunities to implement new software features in archival storage and file systems such as High-Performance Storage System (HPSS), Lustre, and ZFS. This position is in the Livermore Computing Division within the Computing Principal Associate Directorate.
This position offers a hybrid schedule, blending in-person and virtual presence. You will have the flexibility to work from home up to two days per week.
This position will be filled at the SES.3/SES.4 level andwill be filled at eitherlevel based on knowledge and related experience as assessed by the hiring team. Additional job responsibilities (outlined below) will be assigned if hired at the higher level.
You will- Provide software engineering support for production file systems and/or long-term archival storage systems running at petabyte and billion-object scale.
- Troubleshoot and debug highly scalable software-defined storage systems such as Lustre and/or HPSS.
- Contribute to long-term maintenance of HPSS and/or Lustre codebases along with related projects (ZFS, Lustre Monitoring Tools, storage quota systems, etc.).
- Design, implement, and maintain new features and performance improvements for HPSS and/or Lustre (and related projects).
- Review colleagues' code changes and integrate upstream patches into local versions of Lustre and/or HPSS codebases.
- Develop and refine storage system monitoring applications.
- Collaborate with cross-functional teams and across organizations to implement innovative solutions and/or resolve system-wide performance degradations and functionality defects in production storage systems.
- Perform other duties as assigned.
Additionally at the SES.4 level, you will- Serve as a technical subject matter expert and provide technical leadership for complex storage software and systems efforts.
- Mentor and develop technical staff across the organization and share expertise broadly with the next generation of storage professionals.
- Lead cross-functional efforts to diagnose and resolve critical system-wide performance, scalability, and reliability issues.
- Identify and integrate innovative approaches using new technologies, articulating alternative solutions and their impacts.
- Provide strategic technical guidance to project stakeholders, management and partner organizations.
Qualifications - Ability to maintain a U.S. DOE Q-level security clearance which requires U.S. citizenship.
- Bachelor's degree in computer science or related field or the equivalent combination of education and related experience.
- Significant experience with file system internals and/or with hierarchical storage system concepts and systems - including tiered storage systems that integrate flash, HDD, and tape - used to implement long-term archival storage systems.
- Significant experience in a production high performance computing environment. Experience operating storage systems in a production high performance computing (HPC) environment where unplanned downtime has significant operational consequences and end-user impact.
- Advanced proficiency developing software in a team environment with two or more of the following programming languages: C, C++, Rust, or Python.
- Proficiency in Linux command line environments.
- Proficiency with distributed version control software (example: git).
- Advanced verbal and written communication skills necessary to effectively collaborate in a team environment and present and explain technical information and provide advice to management.
- Proficiency with Linux debugging and inspection tools (examples: strace, perf, gdb, the /proc file system, and/or eBPF)
- Availability to work off-hours to resolve production problems, on an as-needed basis, and on a 24×7 on-call rotational schedule.
In Addition, at the SES.4 Level- Highly advanced knowledge of and significant technical experience with Linux operating systems and/or Linux kernel interfaces, as well as experience with configuration, networking, and system security.
- Advanced problem-solving and debugging skills to diagnose multi-component problems in highly parallel, multi-threaded systems and identify the root cause of complex storage system issues.
- Proficientverbal, written, and interpersonal communication skills necessary to interact with all levels of personnel and effectively collaborate in a multi-disciplinary team environment and present and explain technical information, under limited direction.
- Ability to set priorities, independently resolve complex problems, and apply new technologies to broadly defined tasks and projects in a fast-paced environment.
Desired Qualifications- Master's degree in computer science or related field or the equivalent combination of education and related experience.
- Familiarity with HPSS codebase and its implementation.
- Familiarity with Lustre codebase and its implementation.
- Familiarity with open-source storage community contributions - cherry-picking patches, umbrella organizations (such as OpenZFS or OpenSFS), upstreaming contributions (such as to the Lustre mainline), etc.
- Experience with integration of traditional storage systems and cloud-first technologies such as S3 data transfer protocol, object storage, OIDC/OAuth, and/or metadata extraction and cataloging systems.
Pay Range$175,530 - $222,564 Annually at the SES.3 level
$210,630 - $267,060 Annually at the SES.4 level
This is the lowest to highest salary in good faith we would pay for this role at the time of this posting. Pay will not be below any applicable local minimum wage. An employee's position within the salary range will be based on several factors including, but not limited to, specific competencies, relevant education, qualifications, certifications, experience, skills, seniority, geographic location, performance, and business or organizational needs.
Additional Information #LI-Hybrid
Position InformationThis is a Career Indefinite position, open to Lab employees and external candidates.