『An Interview with Microsoft's Saurabh Dighe About Maia 200』のカバーアート

An Interview with Microsoft's Saurabh Dighe About Maia 200

An Interview with Microsoft's Saurabh Dighe About Maia 200

無料で聴く

ポッドキャストの詳細を見る

概要

Maia 100 was a pre-GPT accelerator.
Maia 200 is explicitly post-GPT for large multimodal inference.

Saurabh Dighe says if Microsoft were chasing peak performance or trying to span training and inference, Maia would look very different. Higher TDPs. Different tradeoffs. Those paths were pruned early to optimize for one thing: inference price-performance. That focus drives the claim of ~30% better performance per dollar versus the latest hardware in Microsoft’s fleet.

Intereting topics include:
• What “30% better price-performance” actually means
• Who Maia 200 is built for
• Why Microsoft bet on inference when designing Maia back in 2022/2023
• Large SRAM + high-capacity HBM
• Massive scale-up, no scale-out
• On-die NIC integration

Maia is a portfolio platform: many internal customers, varied inference profiles, one goal. Lower inference cost at planetary scale.

Chapters:
(00:00) Introduction
(01:00) What Maia 200 is and who it’s for
(02:45) Why custom silicon isn’t just a margin play
(04:45) Inference as an efficient frontier
(06:15) Portfolio thinking and heterogeneous infrastructure
(09:00) Designing for LLMs and reasoning models
(10:45) Why Maia avoids training workloads
(12:00) Betting on inference in 2022–2023, before reasoning models
(14:40) Hyperscaler advantage in custom silicon
(16:00) Capacity allocation and internal customers
(17:45) How third-party customers access Maia
(18:30) Software, compilers, and time-to-value
(22:30) Measuring success and the Maia 300 roadmap
(28:30) What “30% better price-performance” actually means
(32:00) Scale-up vs scale-out architecture
(35:00) Ethernet and custom transport choices
(37:30) On-die NIC integration
(40:30) Memory hierarchy: SRAM, HBM, and locality
(49:00) Long context and KV cache strategy
(51:30) Wrap-up

まだレビューはありません