R&D, Computer Science

Sandia National Laboratories   •  

Albuquerque, NM

Industry: Aerospace & Defense

  •  

5 - 7 years

Posted 65 days ago

This job is no longer available.

Are you passionate about your work and dream of utilizing state-of-the-art facilities to explore solutions? Do you want to join a dynamic team that seeks to revolutionize the field of High Performance Computing (HPC) analysis and operations?

We are seeking a computer science R&D professional to join a team developing new software and new operational analytics for high performance computing (HPC) Architectures.

You will enjoy innovating and collaborating with a team researching and developing HPC Monitoring, Performance Analysis, and Response solutions in order to provide advanced, data-focused operations and efficient utilization. The team authors the open-source, R&D 100 Award-winning Lightweight Distributed Metric Service (LDMS) which is used for monitoring several of the largest HPC systems in the world.

on any given day you may be called upon to:

  • Design and develop software for extreme-scale data collection and analysis to assess system and application performance
  • Develop and deploy analysis techniques to detect and classify operational conditions that bottleneck user application performance.
  • Develop data presentations and automated response techniques to enable more efficient computing based on analysis outcomes
  • Work with internal and external organizations operating large-scale HPC systems to deploy monitoring solutions and utilize them for performance understanding
  • Publish and present research results at peer-reviewed conferences

Qualifications We Require

  • MS + 2 years experience or PhD in relevant STEM discipline
  • 5 years of experience programming in C, C++, and/or Python
  • You have experience programming in Unix/Linux environments
  • A record of peer-reviewed publication of results and/or external presentations at scientific conferences
  • Ability to obtain and maintain a DoE Q clearance

Qualifications We Desire

  • Experience developing in Jupyter Notebooks and with NumPy
  • Experience using and/or developing statistical data analysis and/or machine learning techniques (e.g. PCA, scikit-learn, TensorFlow) for significantly sized datasets
  • Experience developing large-scale codes in a multi-developer, open-source software environment
  • Experience developing middleware for HPC systems, including consideration of resilience, memory, scalability, and CPU footprint
  • Experience doing performance analysis studies of software and applications on HPC system architectures, particularly for advanced processors and/or networks
  • Familiarity building and running applications in HPC system environments
  • Experience as a system administrator in Unix/Linux Environments
  • Experience with HPC monitoring technologies, such as LDMS, Elastic Search, Kafka, and LogStash.
  • Experience developing unit and regression tests and running such tests within frameworks, such as Jenkins
  • Current DOE Q security clearance