Ampere Computing

AI Accelerator Software Principal Engineer - Runtime Library

Ampere Computing$182K — $273K *
Enterprise Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • BS in Computer Science, Computer Engineering, Electrical Engineering, or Software Engineering with 8 years experience; or MS with 6 years; or PhD with 3 years.
  • Experience in developing user-mode drivers and/or runtime libraries for GPUs or deep learning accelerators in Linux or RTOS.
  • Strong proficiency in C/C++ and systems-level programming.
  • Background in AI framework enablement, particularly with PyTorch, llama.cpp, and ONNX.
  • Experience with performance engineering, profiling, diagnostics, and execution pipeline optimization.

Responsibilities

  • Lead the design and evolution of an AI Runtime Library for Ampere accelerators.
  • Own end-to-end acceleration paths across the full software/hardware stack.
  • Drive hardware/software co-design and optimization for throughput, latency, and memory efficiency.
  • Contribute to AI co-processor software enablement by collaborating with hardware and systems teams.
  • Integrate runtime components into Ampere platform stacks for robust deployments and consistent performance.

Benefits

  • Premium medical, dental, and vision insurance along with income protection and a 401K.
  • Unlimited Flextime and over 10 paid holidays for a healthy work-life balance.
  • Access to healthy snacks, energizing espresso, and refreshing drinks throughout the day.
Full Job Description
Description

About the Role

As an AI Accelerator Principal Software Engineer - Runtime Library, you will lead the design, development, and optimization of AI runtime software that enables multiple state-of-the-art deep learning models to run efficiently on Ampere's deep learning accelerators. You will work at the intersection of systems software, performance engineering, and AI enablement, helping deliver high-throughput, low-latency inference and a strong foundation for future model and framework support.

What You'll Achieve:

  • Build and evolve an AI Runtime Library for Ampere accelerators that supports execution, scheduling, and lifecycle management of deep learning workloads across multiple model types and popular frameworks.
  • Own end-to-end acceleration paths, going deep into the full SW/HW stack-including:
    • Inference serving and integration layers
    • Compiler/runtime interfaces and graph/IR execution flows
    • Runtime library architecture (APIs, memory management, operators, execution engines)
    • Communication mechanisms and device/host orchestration
  • Drive HW/SW co-design and optimization to improve:
    • Throughput (tokens/requests per second)
    • Latency (kernel execution and scheduling efficiency)
    • Memory efficiency (buffering, paging, reuse, caching)
    • Overall compute utilization and scaling behavior
  • Contribute to AI co-processor/accelerator software enablement, partnering closely with hardware and systems teams to ensure runtime and kernel strategies match accelerator capabilities and constraints.
  • Collaborate cross-functionally to integrate runtime components into Ampere platform stacks, ensuring robust deployment on target environments and consistent performance in production-like workloads.


About You:

  • BS Computer Science, Computer Engineering, Electrical Engineering, or Software Engineering or related technical field & 8 years of related experience; or MS degree & 6 years; or PhD & 3 years
  • Proven experience developing user-mode drivers and/or runtime libraries for GPUs or deep learning accelerators in Linux or RTOS environments.
  • Strong expertise in C/C++ and systems-level programming (memory, threading, synchronization, performance profiling).
  • Demonstrated background in AI framework enablement, with hands-on experience in one or more of:

    • PyTorch (operator/runtime integration, graph execution, correctness/performance work)
    • llama.cpp (inference/runtime execution patterns)
    • ONNX (graph handling, interoperability, execution engines)
  • Strong performance engineering skills, including profiling/diagnostics and optimization of execution pipelines, data movement, and compute kernels.
  • Ability to operate effectively in a collaborative environment-owning complex components while partnering with compilers, hardware, and platform teams.

What We'll Offer:

At Ampere we believe in taking care of our employees and providing a competitive total rewards package that includes base pay, cash long-term incentive, and comprehensive benefits. The full base pay range for this role is between $182,000 and $273,000, except in the San Francisco Bay Area where the range is between $195,000 and $292,000.

Our benefits include health, wellness, and financial programs that support employees through every stage of life.

Benefit highlights include:
  • Premium medical insurance, dental insurance, vision insurance, as well as income protection and a 401K retirement plan, so that you can feel secure in your health and financial future.
  • Unlimited Flextime and 10+ paid holidays so that you can embrace a healthy work-life balance.
  • A variety of healthy snacks, energizing espresso, and refreshing drinks to keep you fueled and focused throughout the day.

And there is much more than compensation and benefits. At Ampere, we foster an inclusive culture that empowers our employees to do more and grow more. We are excited to share more about our career opportunities with you through the interview process. Our benefits include health, wellness, and financial programs that support employees through every stage of life.

#LI-Hybrid

#LI-Hybrid#LI-DR

#LI-Hybrid

About Ampere Computing

Ampere Computing is a semiconductor company that designs and manufactures high-performance processors for cloud and edge computing. The company's processors are based on the Arm architecture and are optimized for power efficiency and performance. Ampere Computing was founded in 2017 by former Intel president Renee James and is headquartered in Santa Clara, California.
Learn more about Ampere Computing
Size
200 employees
Industry
Founded
2017

Similar Jobs

More Jobs at Ampere Computing

More Enterprise Technology Jobs

Find similar AI Accelerator Software Principal Engineer - Runtime Library jobs: