Ep. 8 - Building a C Compiler at Anthropic: A Stress Test for AI Reliability
カートのアイテムが多すぎます
カートに追加できませんでした。
ウィッシュリストに追加できませんでした。
ほしい物リストの削除に失敗しました。
ポッドキャストのフォローに失敗しました
ポッドキャストのフォロー解除に失敗しました
-
ナレーター:
-
著者:
概要
This is Execution Over Everything. We take AI papers, blog posts, and big ideas that sound incredible on X… and we run them headfirst into reality. Not demos. Not vibes. Not one-shot prompts. We’re asking one question: what happens when this thing runs over and over again, under pressure, in the real world?
In this technical audit, we deconstruct Nicholas Carlini’s experiment where 16 parallel Claudes built a 100,000-line C compiler. We ignore the hype and look at the logs: the $20,000 API bill, the 'suicide' command that killed the harness, and why 16 agents turned into a 'Thunderherd' that clobbered its own code.
If you’re building AI infrastructure today, this is your sanity check on the reality of autonomous agents.
- 00:00 — Alt Show Intro
- 00:35 — Cold Open: The $20,000 Suicide * Starts mid-thought with the "GPU bonfire" debate and the incident where an agent ran pkill -9 bash on its own harness.
- 02:20 — The Claim: 16 Agents vs. A C Compiler * Deconstructing Nicholas Carlini’s goal: building a 100,000-line Rust-based C compiler capable of building the Linux kernel.
- 06:15 — Hidden Assumptions: Context Pollution & Time Blindness * Discussing why the harness had to "pre-chew" logs to prevent context window pollution and the agents' lack of wall-clock awareness.
- 09:40 — Execution Reality Check: The Thunderherd Problem * A deep dive into why 16 parallel agents deadlocked and clobbered each other's code when tasked with the monolithic Linux kernel.
- 14:15 — The Verification Boundary: The Oracle Dependency * Analyzing the "cheat code": using GCC as a known-good oracle to grade the AI’s work during the debugging loop.
- 18:25 — The 16-Bit Wall: Where Intelligence Fails * The audit of the 16-bit real mode failure, where the AI hit a hard optimization wall it could not reason its way out of.
- 21:10 — Design Review: Burn Rate & Efficiency * Evaluating the $20,000 API bill for code that remained less efficient than human-written software from 30 years ago.
- 22:50 — What Builders Should Actually Do * Practical guidance: Focus on building the "jail" (the harness and task verifier) over the agent.
- 24:10 — Closing Thought: Repetition is the Bottleneck * Sticking the landing on the ironic truth: The intelligence isn’t the bottleneck; persistence is.
Anthropic engineering
building a C compiler
AI and compilers
determinism in software
AI reliability limits
correctness vs productivity
systems programming AI
execution constraints
retries and failure modes