エピソード

  • Can we safely automate alignment research?
    2025/04/30

    It's really important; we've got a real shot; there are a ton of ways to fail.

    Text version here: https://joecarlsmith.com/2025/04/30/can-we-safely-automate-alignment-research/.

    There's also a video and transcript of a talk I gave on this topic here: https://joecarlsmith.com/2025/04/30/video-and-transcript-of-talk-on-automating-alignment-research/

    続きを読む 一部表示
    1 時間 30 分
  • AI for AI safety
    2025/03/14

    We should try extremely hard to use AI labor to help address the alignment problem. Text version here: https://joecarlsmith.com/2025/03/14/ai-for-ai-safety

    続きを読む 一部表示
    28 分
  • Paths and waystations in AI safety
    2025/03/11

    On the structure of the path to safe superintelligence, and some possible milestones along the way. Text version here: https://joecarlsmith.substack.com/p/paths-and-waystations-in-ai-safety

    続きを読む 一部表示
    18 分
  • When should we worry about AI power-seeking?
    2025/02/19

    Examining the conditions required for rogue AI behavior.

    続きを読む 一部表示
    47 分
  • What is it to solve the alignment problem?
    2025/02/13

    Also: to avoid it? Handle it? Solve it forever? Solve it completely?

    Text version here: https://joecarlsmith.substack.com/p/what-is-it-to-solve-the-alignment

    続きを読む 一部表示
    40 分
  • How do we solve the alignment problem?
    2025/02/13

    Introduction to a series of essays about paths to safe and useful superintelligence.

    Text version here: https://joecarlsmith.substack.com/p/how-do-we-solve-the-alignment-problem

    続きを読む 一部表示
    9 分
  • Fake thinking and real thinking
    2025/01/28

    When the line pulls at your hand.

    Text version here: https://joecarlsmith.com/2025/01/28/fake-thinking-and-real-thinking/.


    続きを読む 一部表示
    1 時間 19 分
  • Takes on "Alignment Faking in Large Language Models"
    2024/12/18

    What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/

    続きを読む 一部表示
    1 時間 28 分