Sr. Principal Software Engineer

Cerence Inc. • $141K — $226K *

US-AnywhereRemote in United States

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years experience in ML inference performance optimization
In-depth knowledge of GPU architecture and memory hierarchies
Proficiency in CUDA and low-level performance tuning
Experience deploying models in production environments
Familiarity with inference engines like vLLM, TensorRT-LLM, llama.cpp, QAIRT
Expertise in quantization techniques including INT8, INT4, FP4, FP8, AWQ, GPTQ
A solid understanding of latency optimization strategies

Responsibilities

Optimize and deploy LLM inference pipelines
Manage inference runtimes across various platforms
Enhance model performance using techniques like quantization and kernel fusion
Drive improvements in latency and throughput for production products
Enable efficient deployment independent of external vendors
Build expertise in key inference engines
Adapt runtimes for constrained environments

Benefits

Annual bonus opportunity
Comprehensive insurance coverage including medical, dental, vision, life, and disability
Paid time off and holidays
Company contributions to RRSP
Equity awards available for select roles
Remote or hybrid work options based on position

Full Job Description

Job Description:

What You Will Work On

Optimize and deploy high-performance LLM inference pipelines

Own inference runtimes across data center, edge, and embedded platforms

Push model performance through quantization, kernel fusion, and cache optimization

Drive latency and throughput improvements that directly impact production products

Enable efficient, reliable deployment without external vendor dependency

Core Responsibilities

Inference Engines & Runtime

Build deep expertise and ownership of:

vLLM

TensorRT-LLM

llama.cpp

QAIRT

Extend and tune inference engines using custom CUDA kernels

Adapt runtimes for constrained and embedded deployment environments

Quantization & Numerical Optimisation

Implement and evaluate quantisation strategies:

INT8, INT4, FP4, FP8, mixed precision

GPTQ

Balance accuracy, latency, memory footprint, and throughput

KV Cache Optimization

Optimize key-value cache performance through:

Paging

Prefix caching

Cache-aware memory layout design

Reduce memory pressure while sustaining high throughput

Latency & Throughput Optimisation

Design and tune:

Batching strategies

Continuous batching

Speculative decoding

Optimize tail latency and tokens/sec under real production traffic patterns

What Success Looks Like

Models deploy efficiently on edge and embedded devices, not just servers

Tokens/sec significantly outperform baseline implementations

End-to-end latency is minimized and predictable

Inference cost per request is materially reduced

The company is no longer dependent on partners for inference optimization

Required Experience & Skills

Strongly Required

Proven experience optimizing ML inference performance in production

Deep understanding of GPU architecture and memory hierarchies

Hands-on experience with CUDA and low-level performance tuning

Experience deploying models beyond research environments

Critical Technical Skills

Inference engines: vLLM, TensorRT-LLM, llama.cpp, QAIRT

CUDA kernel development and profiling

Quantisation techniques: INT8/INT4/FP4/FP8, AWQ, GPTQ

KV cache optimisation and memory layout design

Latency optimisation: batching, speculative decoding, continuous batching

Common Problems You'll Be Solving

Deploy efficiently on edge or embedded targets

Achieve competitive tokens/sec

Reduce and stabilize inference latency

You will be responsible for closing these gaps, creating a major competitive advantage.

What we offer

We offer a generous compensation and benefits package (in addition to the base salary), including:

Salary range $141,400 USD - $226,300 USD It is not typical for offers to be made at or near the top of the range. The actual salary will be determined based on experience and other job-related factors.

Annual bonus opportunity

Insurance coverage (medical, dental, vision, life, and disability)

Paid time off

Paid holidays

Company contribution to the RRSP (Registered Retirement Savings Plan)

Equity awards for certain positions and levels

Remote and/or hybrid work available depending on the position

All compensation and benefits are subject to the terms and conditions of the underlying plans or programs, as applicable, and may be amended, terminated, or replaced from time to time.

About Cerence Inc.

Cerence Inc. is a software company that specializes in voice recognition and natural language understanding technology. The company was spun off from Nuance Communications in 2019 and is headquartered in Newton, Massachusetts. Cerence's software is used in a variety of applications, including automotive infotainment systems, smart speakers, and virtual assistants. The company's clients include many of the world's leading automakers, as well as companies in the consumer electronics and mobile device industries. Cerence has received several awards for its technology, including the 2020 CES Innovation Award for its Cerence Drive platform.

Learn more about Cerence Inc.

Size

1,200 employees

Market Cap

$726.1 million

Industry

Enterprise Technology

Net Income

$12.7 million

5 Year Trend

+6%

Revenue

$347.1 million

* Ladders Estimates

Similar Jobs

Principal Software Engineer
$172K — $349K *
Hewlett Packard Enterprise Development LP
Cupertino, CA 95014 (Santa Clara County)
Reposted Today
Principal Application Software Engineer
$99K — $209K *
Oracle Corporation
Seattle, WA 98115 (King County)
Today
Principal Application Software Engineer
$99K — $209K *
Oracle Corporation
Nashville, TN 37211 (Davidson County)
Today
Principal Software Engineer (Data Warehousing-Lakehouse and Analytics Solutions)
$130K — $160K *
Dell
Hopkinton, MA 01748 (Middlesex County)
Today
Principal Software Engineer (Data Warehousing-Lakehouse and Analytics Solutions)
$130K — $180K *
Dell
Round Rock, TX 78664 (Williamson County)
Reposted Today
Principal Software Engineer (Data Warehousing-Lakehouse and Analytics Solutions)
$130K — $180K *
Dell
Round Rock, TX 78664 (Williamson County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at Cerence Inc.

Sr. Principal Software Engineer
$141K — $226K *
Remote
Today
Information Technology
Remote in United States
Senior Manager, Legal Operations
$113K — $180K *
Boston, MA 02115 (Suffolk County)
1 week ago
Legal & Accounting
In-Person
Senior Manager, Legal Operations
$113K — $180K *
Burlington, MA 01803 (Middlesex County)
1 week ago
Legal & Accounting
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
2 weeks ago
Cyber Threat Hunt Manager
$120K — $150K *
DTCC
Tampa, FL 33647 (Hillsborough County)
Today
Sr. Applied Science , AWS Agentic AI
$192K — $260K *
Amazon
Santa Clara, CA 95051 (Santa Clara County)
Reposted Today
Sr Manager, Regulatory Compliance
$102K — $209K *
Oracle Corporation
Houston, TX 77084 (Harris County)
Today
Enterprise Data Engineer
$100K — $130K *
VTG
Chantilly, VA 20152 (Loudoun County)
Today

Find similar Sr. Principal Software Engineer jobs:

Nationwide Remote

Sr. Principal Software Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Sr. Principal Software Engineer jobs:

Get Ready For Your
Next Interview