Kernel Engineer (Compute / Accelerator)

DensityAI

• $260K — $320K *

Mountain View, CA 94040In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years of experience in high-performance C/C++ programming
Strong understanding of GPU programming using CUDA or equivalent
Knowledge of computer architecture, including memory hierarchies and data movement
Experience with performance profiling and optimization techniques
Practical knowledge of tensor operations like GEMM and convolution
Familiarity with Python for scripting and automation
(Optional) Experience with RISC-V, x86, or ARM64 ISAs, and HPC backgrounds

Responsibilities

Write and optimize compute kernels for a custom AI accelerator
Develop and maintain performance profiling infrastructure
Define shuffle patterns for ML kernel primitives across different computational architectures
Drive design decisions for kernel DSL, including thread management and memory strategies
Enable end-to-end kernel execution on the architectural simulator
Collaborate with compiler teams for MLIR dialect validation
Create comprehensive onboarding and writing guides for new team members

Benefits

Equity grant per company guidelines
Medical, dental, and vision insurance
401(k) plan
Standard paid time off (PTO)
Immigration support for work authorization and visa sponsorships

Full Job Description

ITAR Notice: This role involves access to ITAR-controlled information. Applicants must be U.S. persons (U.S. citizens, U.S. permanent residents, asylees, or refugees) per 22 CFR 120.62
About the role

You will write, evaluate, and profile specialized compute kernels that run on a custom AI accelerator. This is the critical interface between high-level ML workloads and silicon - your code directly determines how effectively the hardware performs. You'll work closely with the architecture and compiler teams to define the kernel programming model, implement core tensor operations, and drive the performance profiling workflow that validates silicon design decisions.
What you'll do

Write and optimize compute kernels for a custom AI accelerator - tensor operations, data movement patterns, memory hierarchy exploitation
Develop and maintain profiling infrastructure to measure kernel performance against architectural targets
Define and document shuffle patterns for ML kernel primitives across CPU-like control, tensor cores, and CUTLASS-style operations
Drive kernel DSL design decisions - thread spawn mechanisms, register passing conventions, and memory management strategies
Enable end-to-end kernel execution on the architectural simulator
Collaborate with the compiler team on the MLIR dialect - your kernels are the primary validation target
Create onboarding documentation and kernel writing guides for the broader team

What we're looking for

C/C++ - production-grade systems code, not scripted glue. You'll write performance-critical kernels
CUDA or equivalent accelerator programming - deep experience writing GPU kernels, understanding warp/wavefront execution, memory coalescing, shared memory optimization. The mental model transfers directly
Computer architecture - you need to reason about pipelines, memory hierarchies, data movement costs, and how software maps to hardware
Performance profiling and optimization - you live in profilers. Identifying bottlenecks, measuring throughput, and iterating until kernels meet targets is the core loop
Tensor operations - practical understanding of GEMM, convolution, attention, reduction, and scatter/gather as they map to hardware
Python - for scripting, DSL integration, and profiling automation
(Optional) RISC-V, x86, or ARM64 ISA experience
(Optional) MLIR or LLVM compiler infrastructure
(Optional) HPC or scientific computing background (large-scale parallel compute intuition)
(Optional) FPGA or Verilog/SystemVerilog (ability to read RTL and reason about the hardware you're targeting)
(Optional) Familiarity with CUTLASS, Triton, or similar kernel libraries

Compensation

Final offers depend on level, location, and skills relevant to the role. Additional compensation: equity grant per company guidelines; medical / dental / vision; 401(k); standard PTO.
Visa Sponsorship

DensityAI sponsors qualified candidates for H-1B, O-1, TN, E-3, and other employment-based visas, and we welcome applicants on F-1 OPT and STEM-OPT. Work authorization is required at start; we provide immigration support to secure or transfer status.

Full compensation packages are based on candidate experience and relevant certifications.

California pay range

$260,000-$320,000 USD

* Ladders Estimates

Similar Jobs

ASIC Design Verification Engineering Technical Leader
$183K — $263K *
Cisco
San Jose, CA 95123 (Santa Clara County)
Reposted Yesterday
Founding Embedded Engineer
$180K — $280K *
BootLoop
San Francisco, CA 94112 (San Francisco County)
2 days ago
Senior Embedded Software Engineer - Firmware
$171K — $264K *
Applied Intuition
Sunnyvale, CA 94087 (Santa Clara County)
2 days ago
Software Engineer - Performance Optimization
$199K — $264K *
Applied Intuition
Sunnyvale, CA 94087 (Santa Clara County)
2 days ago
Embedded Software Engineer - Core OS
$171K — $264K *
Applied Intuition
Sunnyvale, CA 94087 (Santa Clara County)
2 days ago
Systems Architect - POSIX
$204K — $318K *
Applied Intuition
Sunnyvale, CA 94087 (Santa Clara County)
2 days ago

Get Ready For Your
Next Interview

More Jobs at DensityAI

DV Formal Verification
$200K — $420K *
Mountain View, CA 94040 (Santa Clara County)
Today
Consumer Technology
In-Person
Compiler Engineer - LLVM Backend
$180K — $320K *
Mountain View, CA 94040 (Santa Clara County)
Today
Information Technology
In-Person
Performance Verification Engineer
$200K — $350K *
Mountain View, CA 94040 (Santa Clara County)
Today
Consumer Technology
In-Person
Compiler Engineer - MLIR
$200K — $360K *
Mountain View, CA 94040 (Santa Clara County)
Today
Enterprise Technology
In-Person
Design for Test/Manufacturing Engineer
$230K — $350K *
Mountain View, CA 94040 (Santa Clara County)
Today
Technical Services
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
1 week ago
Staff IT Auditor II
$80K — $110K *
Intercontinental Exchange Holdings, Inc.
Atlanta, GA 30349 (Fulton County)
Today
Senior Manual Test Engineer
$110K — $160K *
Steampunk
Mclean, VA 22101 (Fairfax County)
Today
IAM Architect
$120K — $130K *
Tata Consultancy Services
Alpharetta, GA 30022 (Fulton County)
Today
SYSTEMS PROGRAMMING ADMINISTRATOR - SES - 71000258
$84K *
State of Florida
Tallahassee, FL 32303 (Leon County)
Today

Find similar Kernel Engineer (Compute / Accelerator) jobs:

Nationwide Mountain View, CA

Kernel Engineer (Compute / Accelerator)

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Kernel Engineer (Compute / Accelerator) jobs:

Get Ready For Your
Next Interview