『Reiner Pope (MatX): Designing AI Chips From First Principles for LLMs』のカバーアート

Reiner Pope (MatX): Designing AI Chips From First Principles for LLMs

Reiner Pope (MatX): Designing AI Chips From First Principles for LLMs

無料で聴く

ポッドキャストの詳細を見る

今ならプレミアムプランが3カ月 月額99円

2026年5月12日まで。4か月目以降は月額1,500円で自動更新します。

概要

Reiner Pope is the co-founder and CEO of MatX, the startup building chips designed from first principles for LLMs. Before MatX, Reiner was on the Google Brain team training LLMs, and his co-founder Mike Gunter was on the TPU team. They left Google one week before ChatGPT was released.

A counterintuitive throughput insight from the conversation:

“Low latency means small batch sizes. That is just Little’s law. Memory occupancy in HBM is proportional to batch size. So you can actually fit longer contexts than you could if the latency were larger. Low latency is not just a usability win, it improves throughput.”

We get into:

• The hybrid SRAM + HBM bet, and why pipeline parallelism finally works

• Overcoming the CUDA moat

• Why frontier labs are willing to bet on an AI ASIC startup

• Memory-bandwidth-efficient attention, numerics, and what MatX publishes (and what it does not)

• Why 95% of model-side news is noise for chip design

• Why sparse MoE drives MatX to “the most interconnect of any announced product”

• How MatX uses AI for its own chip design

• The biggest challenges ahead

Chapters:

00:00 “We left Google one week before ChatGPT”

00:24 Intro: who is MatX

01:17 Origin story: leaving Google for LLM chips

02:21 GPT-3 and the “too expensive” problem

04:25 Why buy hardware that is not a GPU

05:52 Overcoming the CUDA moat

08:46 Early investors

09:35 The name MatX

09:59 The chip: matrix multiply + hybrid SRAM/HBM

12:11 Why pipeline parallelism finally works

14:22 Reading papers and Google going dark

15:20 Research agenda: attention and numerics

17:06 Five specs and meeting customers where they are

19:24 Why frontier labs are the natural first customer

20:32 Workloads: training, prefill, decode

22:18 Little’s law and the throughput case for low latency

24:29 Interconnect and MoE topology

26:35 Inside the team: 100 people, full stack

28:32 Agentic AI: 95% noise for hardware

30:35 KV cache sizing in an agentic world

32:11 How MatX uses AI for chip design (Verilog + BlueSpec)

34:23 Go to market: proving credibility under NDA

35:12 Porting effort for frontier labs

36:34 Biggest skepticism: manufacturing at gigawatt scale

37:32 Hiring plug


Austin Lyons @ Chipstrat: https://www.chipstrat.com

Vik Sekar @ Vik's Newsletter: https://www.viksnewsletter.com/

まだレビューはありません