Ep. 7 - The Execution Tax of Karpathy’s $20 GPT-2

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Ep. 7 - The Execution Tax of Karpathy’s $20 GPT-2

無料で聴く

ポッドキャストの詳細を見る

概要

Andrej Karpathy just showed that GPT-2 can now be trained in under three hours for roughly $20 — reframing a once-“dangerous” model as the new MNIST. On paper, fp8 promises 2× FLOPS. In practice, it delivers something far messier: overhead, precision tradeoffs, and marginal gains that only appear after careful tuning.

In this episode of Execution Over Everything, we pressure-test what Karpathy is actually working on beneath the headline. We unpack why theoretical speedups don’t translate cleanly to wall-clock wins, how fp8 shifts cost and failure modes rather than eliminating them, and what breaks once you embed these techniques into real, repeated training workflows.

This isn’t about celebrating faster demos. It’s about understanding the execution tax — the hidden costs in retries, numerics, and operational complexity that show up only when systems run continuously in the real world.

Andrej Karpathy
fp8 training
GPT-2 training
mixed precision training
H100 GPUs
GPU optimization
AI training cost
model training speed
FLOPS vs throughput
wall-clock performance
training overhead
loss curves
bf16 vs fp8
scaling laws
AI infrastructure
execution bottlenecks
retries and failure modes
production ML systems
execution tax
Execution Over Everything

まだレビューはありません