Ep. 7 - The Execution Tax of Karpathy’s $20 GPT-2
カートのアイテムが多すぎます
カートに追加できませんでした。
ウィッシュリストに追加できませんでした。
ほしい物リストの削除に失敗しました。
ポッドキャストのフォローに失敗しました
ポッドキャストのフォロー解除に失敗しました
-
ナレーター:
-
著者:
概要
Andrej Karpathy just showed that GPT-2 can now be trained in under three hours for roughly $20 — reframing a once-“dangerous” model as the new MNIST. On paper, fp8 promises 2× FLOPS. In practice, it delivers something far messier: overhead, precision tradeoffs, and marginal gains that only appear after careful tuning.
In this episode of Execution Over Everything, we pressure-test what Karpathy is actually working on beneath the headline. We unpack why theoretical speedups don’t translate cleanly to wall-clock wins, how fp8 shifts cost and failure modes rather than eliminating them, and what breaks once you embed these techniques into real, repeated training workflows.
This isn’t about celebrating faster demos. It’s about understanding the execution tax — the hidden costs in retries, numerics, and operational complexity that show up only when systems run continuously in the real world.
Andrej Karpathy
fp8 training
GPT-2 training
mixed precision training
H100 GPUs
GPU optimization
AI training cost
model training speed
FLOPS vs throughput
wall-clock performance
training overhead
loss curves
bf16 vs fp8
scaling laws
AI infrastructure
execution bottlenecks
retries and failure modes
production ML systems
execution tax
Execution Over Everything