Decomposing Superposition: Sparse Autoencoders for Neural Network Interpretability

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Decomposing Superposition: Sparse Autoencoders for Neural Network Interpretability

無料で聴く

ポッドキャストの詳細を見る

このコンテンツについて

This episode explores how sparse autoencoders can decode the phenomenon of superposition in neural networks, demonstrating that the seemingly impenetrable compression of features into neurons can be partially reversed to extract interpretable, causal features. The discussion centers on an Anthropic research paper that successfully maps specific behaviors to discrete neural network locations in a 512-neuron model, proving that interpretability is achievable though computationally expensive, with important implications for AI safety and control mechanisms.

Credits

Cover Art by Brianna Williams

TMOM Intro Music by Danny Meza

A special thank you to these talented artists for their contributions to the show.

Links and References---------------------------------------------------

Academic PapersTowards Monosemanticity: Decomposing Language Models With Dictionary Learning - https://transformer-circuits.pub/2023/monosemantic-features - Anthropic (May 2024)

Toy Models of Superposition “https://transformer-circuits.pub/2022/toy_model/index.html” - Anthropic (December 2022)

Alignment Faking in Large Language Models - https://www.anthropic.com/research/alignment-faking - Anthropic (December 2024)

Agentic Misalignment: How LLMs Could Be Insider Threats - https://www.anthropic.com/research/agentic-misalignment - Anthropic (January 2025)

News

Deep Seek OCR Model Release - https://deepseek.ai/blog/deepseek-ocr-context-compression

Meta AI Division Layoffs - https://www.nytimes.com/2025/10/22/technology/meta-plans-to-cut-600-jobs-at-ai-superintelligence-labs.html

Apple M5 Chip Announcement - https://www.apple.com/newsroom/2025/10/apple-unleashes-m5-the-next-big-leap-in-ai-performance-for-apple-silicon/

Anthropic Claude Haiku 4.5 - https://www.anthropic.com/news/claude-haiku-4-5

Other

Jon Stewart interview with Geoffrey Hinton - https://www.youtube.com/watch?v=jrK3PsD3APk

Blake Lemoine and AI Psychosis - https://www.youtube.com/watch?v=kgCUn4fQTsc

Abandoned Episode Titles

"Star Trek: The Wrath of Polysemanticity"
"The Hitchhiker's Guide to the Neuron: Don't Panic, It's Just Superposition"
"Honey, I Shrunk the Features (Then Expanded Them 256x)"
"The Legend of Zelda: 131,000 Links Between Neurons"

まだレビューはありません

特集

カテゴリー別

Decomposing Superposition: Sparse Autoencoders for Neural Network Interpretability

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Decomposing Superposition: Sparse Autoencoders for Neural Network Interpretability

このコンテンツについて