Full Job Description
We are looking for a Staff AI Inference & Acceleration Engineer to join the Platform Software team and own the on-board inference architecture for Figure's humanoid robots. You will be the technical authority on how AI workloads are mapped, optimized, and executed across the robot's compute hardware - driving down power consumption and cost while meeting the strict latency and reliability demands of a real-time autonomous system.
Responsibilities:
• Own the on-board inference architecture - mapping models to available accelerators (NPU, GPU, DSP, CPU) based on latency, power, and memory budgets.
• Partition inference workloads across heterogeneous compute resources, balancing real-time performance with power and thermal constraints.
• Define and maintain a system-level compute budget across all inference tasks running on the robot.
• Evaluate next-generation acceleration hardware and contribute to the definition of future compute platform requirements.
• Optimize inference toolchains end-to-end - from model export through runtime execution - for target hardware.
• Apply quantization (INT8, INT4, mixed-precision), pruning, operator fusion, and other compression techniques to reduce compute, memory, and power footprint.
• Profile inference pipelines to identify and eliminate bottlenecks in latency, memory bandwidth, and power consumption.
• Optimize kernel scheduling, memory layout, and data movement across the compute hierarchy.
• Partner closely with the AI/ML team to define model architecture constraints that are hardware-friendly from the outset.
• Work with the Platform Software team on runtime integration, scheduling, and power management.
• Engage with silicon vendors and research teams to track the accelerator landscape and influence hardware roadmaps.
Requirements:
• M.S. or Ph.D. in Computer Engineering, Electrical Engineering, Computer Science, or a related field - or equivalent industry experience.
• At least 8 years of industry experience in hardware acceleration, ML systems, or compute architecture.
• Deep understanding of AI/ML inference - model formats (ONNX, TFLite, etc.), inference runtimes, and deployment pipelines.
• Hands-on experience optimizing models for edge or embedded hardware using quantization, pruning, and operator-level tuning.
• Strong understanding of computer architecture - memory hierarchies, data movement, and heterogeneous compute.
• Experience profiling and benchmarking inference workloads across CPU, GPU, NPU, DSP.
• Familiarity with low-level toolchains and compilation frameworks (e.g. TVM, MLIR, TensorRT, Torch, SNPE/QNN, JAX, CUDA, ROCm).
• Solid software engineering skills in C++ and Python.
• Strong cross-functional communication skills - able to work effectively across hardware, software, and AI/ML teams.
Bonus Qualifications:
• Knowledge of real-time operating constraints and their impact on inference scheduling.
• Track record of co-designing model architectures with ML teams to meet hardware constraints.
The US base salary range for this full-time position is between $180,000 - $275,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.