『Decoding the Transformer: From Attention to Inference』のカバーアート

Decoding the Transformer: From Attention to Inference

Decoding the Transformer: From Attention to Inference

無料で聴く

ポッドキャストの詳細を見る

概要

In this episode, Herman and Corn break down the "black box" of the transformer architecture, moving beyond the 2017 "Attention Is All You Need" paper to explore how modern LLMs actually process data during inference. They discuss the critical shift from encoder-decoder models to decoder-only giants, the memory-saving brilliance of KV caching, and the hardware-aware speed of FlashAttention-3. From speculative decoding to Rotary Positional Embeddings, learn how these technical plumbing upgrades have transformed simple translation tools into sophisticated world models capable of reasoning. This deep dive covers the journey of a token from a numerical vector to a human-readable response, revealing the complex engineering that powers today's most advanced AI systems.
まだレビューはありません