What You'll Do Here- Build the host-side interface library - device memory management, DMA, streams and events, sync primitives - that every compiler-emitted program runs on top of
- Own and extend the executable format: the compilerruntime contract, its versioning, the weight and quantization layouts that let compiler and runtime evolve independently
- Design the custom-kernel ABI - calling convention, sync semantics, lifecycle - and the host-side marshaling layer (DLPack, the buffer protocol, numpy) that gets Python tensors to the device
- Build Python bindings via PyO3, with a C-ABI shim as the alternative integration path for downstream consumers
- Build the LLM inference serving stack - paged KV cache, continuous batching, request scheduling, token streaming - and the cluster orchestration primitives underneath it
- Bring up interconnect topology from the host and own the failure-detection and clean-teardown path for stop-restructure-resume recovery across racks
- Design what the chip exposes to host-side profilers and debuggers - perf counters, traces, and the Python surfaces ML engineers actually use - and hit measurable performance targets on runtime overhead and serving throughput
Who You Are- Strong experience in a systems programming language - Rust, C, C++, or Go - including memory management, allocator design, and FFI/ABI work
- Have built Python interop layers in production (PyO3, ctypes, pybind11, or equivalent C-ABI bridging)
- Have designed and maintained API or ABI contracts between teams - versioning, evolution, breaking-change discipline - not just consumed someone else's
- Hands-on with at least one accelerator programming model (CUDA, ROCm, oneAPI Level Zero, TPU, or comparable) - enough to reason about device memory, async execution, and kernel launch
- ML-systems literate - comfortable with the training and inference loop, what collectives do, what a tensor layout is. Research depth not required.
Bonus Points If You Have- LLM inference internals - vLLM, TensorRT-LLM, or SGLang (paged attention, scheduler design)
- Rust at depth, including proc macros, unsafe with soundness reasoning, and complex lifetime/trait work
- Custom allocator design (slab, paged, arena) or other low-level memory work
- ML framework integration experience (PyTorch custom backends, JAX/XLA, ONNX runtime)
- Profiler or tracing infrastructure work (perfetto, Nsight, or a custom stack)
- Driver-adjacent or kernel-bypass work, or prior new-silicon bring-up
CompensationThe US base salary for this full-time position is determined based on a variety of factors including role, experience, location, job related skills, and relevant education and training. Career length is only a guideline for compensation.
- Early Career - $120,000 - $250,000 + equity
- Mid Career - $175,000 - $362,500 + equity
- Senior Career - $250,000 - $475,000 + equity
What We Offer- A Stake in our success Generous equity, with option cash/equity swap at offer, and option to employee early exercise.
- Health & Wellness Company subsidized Health, Dental, Vision, and Life insurance; Pre-tax Health Savings Accounts with generous company contribution (even if you don't)
- Time To Recharge 4 weeks paid time off (accrued), 12 company holidays, and 3 weeks remote/flexible work per year
- Support to Parents Up to 12 weeks of paid parental leave, regardless of your path to parenthood
- Learning & Development $1,500 yearly towards your professional development e.g. conferences, courses, and other learning opportunities
- Team Connection Team Lunches, quarterly off-sites, and regular town halls
- Financial Wellbeing. 401K and/or Roth IRA, with 5% company contribution, even if you don't!
- Flexible Spending Accounts Pre-tax spend accounts for medical, dental/vision, dependent care, parking, and transit expenses
- Commute On Us For those commuting up to 1 hour, put your rideshare cost on our company card and reclaim the drive-time to get work done!
- MatX E[x]tras $50 per month to use on the perks you care about most
- Remote Perks We work remotely Monday & Friday, supported by home-tech setup, and remote wifi expense reimbursement