Machine Learning Systems Engineer

Motional

$144K — $192K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's, Master's or PhD in Computer Science, Computer Engineering or related field.
  • Strong proficiency in Python programming language.
  • Extensive hands-on experience with the PyTorch machine learning framework.
  • Knowledge of optimizing ML model execution during training and inference.
  • Exceptional analytical and problem-solving skills.

Responsibilities

  • Utilize performance profiling tools to identify system bottlenecks.
  • Implement optimizations to improve data loading and computational processes.
  • Optimize distributed training pipelines using PyTorch Distributed.
  • Design and maintain high-performance GPU kernels for ML workloads.
  • Enhance data loading pipelines for maximum training throughput.

Benefits

  • Medical, dental, and vision insurance options.
  • 401k plan with company matching contributions.
  • Health savings accounts available.
  • Life insurance offerings and pet insurance options.
  • Flexible hybrid or fully remote work arrangements.
Full Job Description
Mission Summary:
We are looking for a Machine Learning Systems Engineer to join our ML Acceleration team. In this role, you will be responsible for the core systems that enable our researchers to train frontier models at scale, focusing obsessively on speed, cost, reliability, and throughput. You will work at the intersection of machine learning research and high-performance systems engineering. Your work will directly impact our ability to scale large-scale distributed model training and reduce the time-to-convergence for our next generation of models.

What you'll be doing:
  • Performance Profiling & Optimization: Utilize profiling tools (e.g., Nsight, PyTorch Profiler) to identify bottlenecks in data loading, gradient computation, and communication. Implement optimizations like kernel fusion, sharding, and tiling to improve step time.
  • Distributed Training: Optimize distributed training pipelines using frameworks such as PyTorch Distributed.
  • Kernel Development: Design and maintain high-performance GPU kernels in Triton or CUDA for state-of-the-art ML workloads.
  • Data Pipeline Engineering: Optimize robust data loading pipelines that maximize training throughput.

What we're looking for:
  • Education: Bachelor's, Master's degree, or PhD in Computer Science, Computer Engineering, or a related technical discipline.
  • Software Engineering: Strong proficiency in Python.
  • ML Frameworks: Extensive hands-on experience with PyTorch.
  • ML Knowledge: Experience optimizing machine learning model execution during training and inference, alongside a strong understanding of fundamental machine learning concepts, architectures, and processes.
  • Problem Solving: Exceptional analytical and problem-solving skills, with a bias for action and a data-driven approach to technical challenges.

We encourage a hybrid schedule with in-office time at one of our locations in Boston, Pittsburgh, or Las Vegas to support collaboration, or this role can be fully remote.

The salary range for this role is an estimate based on a wide range of compensation factors including but not limited to specific skills, experience and expertise, role location, certifications, licenses, and business needs. The estimated compensation range listed in this job posting reflects base salary only. This role may include additional forms of compensation such as a bonus or company equity. The recruiter assigned to this role can share more information about the specific compensation and benefit details associated with this role during the hiring process.

Candidates for certain positions are eligible to participate in Motional's benefits program. Motional's benefits include but are not limited to medical, dental, vision, 401k with a company match, health saving accounts, life insurance, pet insurance, and more.

Salary Range

$144,000-$192,000 USD

Similar Jobs

More Jobs at Motional

More Information Technology Jobs

Find similar Machine Learning Systems Engineer jobs: