About the role:We are looking for an experienced and highly motivated Lead Vector Compute Architect to lead the architecture definition and technical direction for Bolt's next-generation GPUs. The ideal candidate will have strong expertise in data parallel compute unit architecture development, performance modeling, data path integration, and cross-functional collaboration across hardware, software, and systems teams.
This role involves defining scalable and high-performance architectures for advanced compute workloads including graphics, HPC, and system management. This role is on-site and requires someone to be local to the Bay Area.
What you'll do:- Define data parallel microarchitecture satisfying ISA constraints.
- Drive architecture tradeoff analysis for performance, power, area, bandwidth, latency, and scalability.
- Develop and review system architecture specifications, interface definitions, and microarchitecture requirements.
- Collaborate with RTL, verification, physical design, firmware, software, and system teams throughout the development cycle.
- Lead performance modeling, workload analysis, and bottleneck identification using C/C++/SystemC or similar modeling environments.
- Define memory hierarchy, coherency architecture, and cache structures.
- Work closely with verification teams to define architectural test plans and validation strategies.
- Support silicon bring-up, debug, performance tuning, and post-silicon optimization.
- Contribute to long-term technology and product roadmap planning.
Qualifications:- Strong understanding of modern data parallel microarchitectures and subsystem integration.
- Bachelor's or Master's degree in Electrical Engineering, Computer Engineering, Computer Science, or related field.
- 6+ years of experience in modern data parallel microarchitecture including:
- Workload characterization and profiling
- Out-of-order data dependency and control
- Utilization / occupancy optimization
- High-performance architecture design techniques
Experience with one or more of the following: - CPU/GPU/NPU architectures
- NoC/interconnect architectures
- Cache coherency protocols (CHI/ACE/CXL)
- High-speed interfaces (PCIe, UCIe, Ethernet)
- Memory systems (DDR, LPDDR, HBM, GDDR)
- Power, performance, and area optimization
- Strong knowledge of RTL development and verification methodologies.
- Experience with architecture modeling and performance analysis tools.
- Familiarity with firmware/software interaction in complex SoC systems.
- Excellent problem-solving, communication, and leadership skills.
Compensation Range: $180,000-$220,000 per year (California). This range represents the anticipated base pay for this role; the final offer may vary based on qualifications, experience, and location.
Benefits:- Medical, Dental, & Vision - 100% covered premiums
- Equity - Stock Options
- 401(k) match
- WFH Hardware