What if We Succeed?

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

What if We Succeed?

無料で聴く

ポッドキャストの詳細を見る

このコンテンツについて

This episode explores why AI systems might develop harmful or deceptive behaviors even without malicious intent, examining concepts like convergent instrumental goals, alignment faking, and mesa optimization to explain how models pursuing benign objectives can still take problematic actions. The hosts argue for the critical importance of interpretability research and safety mechanisms as AI systems become more capable and widely deployed, using real examples from recent Anthropic papers to illustrate how advanced AI models can deceive researchers, blackmail users, and amplify societal biases when they become sophisticated enough to understand their operational context.

Credits

Cover Art by Brianna Williams
TMOM Intro Music by Danny Meza

A special thank you to these talented artists for their contributions to the show.

Links and References

"Alignment Faking in Large Language Models" - Anthropic (December 2024)

"Agentic Misalignment: How LLMs Could Be Insider Threats" - Anthropic (January 2025)

Robert Miles - AI researcher https://www.youtube.com/c/robertmilesai

Stuart Russell - AI researcher Human Compatible: Artificial Intelligence and the Problem of Control

Claude Shannon - Early AI pioneer https://en.wikipedia.org/wiki/Claude_Shannon

Marvin Minsky - Early AI pioneer https://en.wikipedia.org/wiki/Marvin_Minsky

Orthogonality Thesis - Nick Bostrom's original paper

Convergent Instrumental Goals -

https://en.wikipedia.org/wiki/Instrumental_convergence

https://dl.acm.org/doi/10.5555/1566174.1566226

Mesa Optimization - https://www.researchgate.net/publication/333640280_Risks_from_Learned_Optimization_in_Advanced_Machine_Learning_Systems

GPT-3.5 CAPTCHA/Fiverr Incident - https://www.vice.com/en/article/gpt4-hired-unwitting-taskrabbit-worker/

Internet of Bugs YouTuber - https://www.youtube.com/@InternetOfBugs

EU AI Legislation - https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence

"Chat Control" Legislation - https://edri.org/our-work/chat-control-what-is-actually-going-on/

https://en.wikipedia.org/wiki/Regulation_to_Prevent_and_Combat_Child_Sexual_Abuse

ChatGPT User Numbers - https://openai.com/index/how-people-are-using-chatgpt/

Self-driving Car Safety Statistics - https://waymo.com/blog/2024/12/new-swiss-re-study-waymo

Abandoned Episode Titles

“What Could Possibly Go Wrong?”
“The Road to HAL is Paved with Good Intentions”

まだレビューはありません

特集

カテゴリー別

What if We Succeed?

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

What if We Succeed?

このコンテンツについて