Do AI Models Lie on Purpose? Scheming, Deception, and Alignment with Marius Hobbhahn of Apollo Research

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Do AI Models Lie on Purpose? Scheming, Deception, and Alignment with Marius Hobbhahn of Apollo Research

無料で聴く

ポッドキャストの詳細を見る

概要

Marius Hobbhahn is the CEO and co-founder of Apollo Research. Through a joint research project with OpenAI, his team discovered that as models become more capable, they are developing the ability to hide their true reasoning from human oversight.

Jeffrey Ladish, Executive Director of Palisade Research, talks with Marius about this work. They discuss the difference between hallucination and deliberate deception and the urgent challenge of aligning increasingly capable AI systems.

Links:

Marius’ Twitter: https://twitter.com/mariushobbhahn

Apollo Research Twitter: https://twitter.com/apolloaievals

Apollo Research: https://www.apolloresearch.ai

Palisade Research: https://palisaderesearch.org/

Twitter/X: https://x.com/PalisadeAI

Anti-Scheming Project: https://www.antischeming.ai

Research paper “Stress Testing Deliberative Alignment for Anti-Scheming Training”: https://www.arxiv.org/pdf/2509.15541

Blog posts from OpenAI and Apollo: https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/ https://www.apolloresearch.ai/research/stress-testing-deliberative-alignment-for-anti-scheming-training/

まだレビューはありません