NVIDIA is hiring senior software engineers for its AI infrastructure RAPIDS team. RAPIDS is a suite of open source software libraries that enables executing end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA CUDA for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
You will need to have strong C++ programming skills (CUDA C++ or other GPU programming experience is desirable) experience with software building and packaging, and excellent communication and planning skills. You will have the opportunity to work with world-class engineers at NVIDIA and from the RAPIDS open source community to advance accelerated data science.
What you'll be doing:
- Analyze, design, and implement optimized GPU algorithms for large-scale data analytics and machine learning.
- Performance analysis, benchmarking, and troubleshooting GPU-accelerated libraries.
- Collaborating with a multi-functional team to understand requirements and implement or improve solutions
What we need to see:
- MS degree or higher in Computer Science, Computer Engineering or related fields or equivalent work experience
- 7+ years of experience in Computer Science, Artificial Intelligence, Applied Math, or related field
- Strong analytical problem-solving skills, algorithms and mathematics fundamentals.
- Excellent C/C++ programming, debugging, performance analysis, and test design
- Good communication and documentation habits.
- Ability to work independently and manage your own development efforts.
Ways to stand out from the crowd:
- Experience developing algorithms with CUDA C++ or other parallel programming technologies.
- Experience developing distributed systems and algorithms using MPI, OpenMP, NCCL, or similar technologies.
- Experience with “modern” C++ standards: C++11, C++14, C++17
- Experience with data analytics, machine learning, and related technologies.
- Experience with one or more of: Python, git, CMake, Google Test & Benchmark, assembly / low level programming, performance tuning.