Reiner Pope is the co-founder and CEO of MatX, the startup building chips designed from first principles for LLMs. Before MatX, Reiner was on the Google Brain team training LLMs, and his co-founder Mike Gunter was on the TPU team. They left Google one week before ChatGPT was released.
A counterintuitive throughput insight from the conversation:
“Low latency means small batch sizes. That is just Little’s law. Memory occupancy in HBM is proportional to batch size. So you can actually fit longer contexts than you could if the latency were larger. Low latency is not just a usability win, it improves throughput.”
We get into:
• The hybrid SRAM + HBM bet, and why pipeline parallelism finally works
• Overcoming the CUDA moat
• Why frontier labs are willing to bet on an AI ASIC startup
• Memory-bandwidth-efficient attention, numerics, and what MatX publishes (and what it does not)
• Why 95% of model-side news is noise for chip design
• Why sparse MoE drives MatX to “the most interconnect of any announced product”
• How MatX uses AI for its own chip design
• The biggest challenges ahead
Chapters:
00:00 “We left Google one week before ChatGPT”
00:24 Intro: who is MatX
01:17 Origin story: leaving Google for LLM chips
02:21 GPT-3 and the “too expensive” problem
04:25 Why buy hardware that is not a GPU
05:52 Overcoming the CUDA moat
08:46 Early investors
09:35 The name MatX
09:59 The chip: matrix multiply + hybrid SRAM/HBM
12:11 Why pipeline parallelism finally works
14:22 Reading papers and Google going dark
15:20 Research agenda: attention and numerics
17:06 Five specs and meeting customers where they are
19:24 Why frontier labs are the natural first customer
20:32 Workloads: training, prefill, decode
22:18 Little’s law and the throughput case for low latency
24:29 Interconnect and MoE topology
26:35 Inside the team: 100 people, full stack
28:32 Agentic AI: 95% noise for hardware
30:35 KV cache sizing in an agentic world
32:11 How MatX uses AI for chip design (Verilog + BlueSpec)
34:23 Go to market: proving credibility under NDA
35:12 Porting effort for frontier labs
36:34 Biggest skepticism: manufacturing at gigawatt scale
37:32 Hiring plug
Austin Lyons @ Chipstrat: https://www.chipstrat.com
Vik Sekar @ Vik's Newsletter: https://www.viksnewsletter.com/