Simulation RuntimeMapping a computational problem to a runtime environment is the engine of our product. Making our simulations run fast across heterogeneous compute platforms while retaining accuracy is at the center of our value proposition to customers. Delivering involves overcoming many challenges; efficient data sharing or customizing inference and math to bespoke runtime hardware.
The value: making Vinci Simulations run effectively regardless of the hardware: from a single desktop to multi-gpu clusters that span global data center sites. This kind of compute platform flexibility means simulations are easier to run and complete faster when hardware permits. The ability to losslessly divvy up simulations across large scale compute resources will unlock new utility for our customers and power larger more useful applications.
What You Will DoYour north star will be parallelism and correctness.
In this role you will design and implement low-latency, scalable solutions to decompose and distribute our production simulations across the most challenging computational boundaries;
- Multi-GPU machines
- Multi-Node clusters
- Networked nodes
What We're Looking ForBeing successful in this role requires a deep understanding of scientific computing methods, boundary decomposition problems, and parallel computing.
Qualifications;
- Experience working on High Performance Computing runtime applications
- Experience with any of highly parallel computing frameworks;
- Experience with GPU Programming; Cuda, ROCM, Triton
- Have contributed to a production data processing system.
- Familiarity with Statistical validation methods
- Outlier detection, Bayes method, convergence criterion for nonlinear solvers
- Familiarity with ML basics
- back prop, loss functions, generators, embeddings, transformer models
We are very excited to talk with you if you have
- Worked on highly performant deployed inference environments
- Have shipped HPC library components
- Experience going from early stage prototype moving to a production environment
- At a Startup or National Lab
- Experience with highly parallel ML training frameworks such as Ray
Engineering Expectations- Software engineering fundamentals
- Comfortable meeting software design standards to get code into a production environment.
- A practical approach to prototyping necessary components that are currently missing.
- Strong CI, regression testing, and validation discipline
- Comfort learning and evolving model deployment & runtime infrastructure