エピソード

  • What the Freakiness of 2025 in AI Tells Us About 2026
    2025/12/23

    It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.

    http://matsprogram.org/s26-aie


    My new app! https://lmcouncil.ai


    Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094

    Chapters:
    00:00 - Introduction
    00:34 - Reasoning Models … and limits
    02:54 - A playable world
    03:36 - Realism
    03:50 - AI Slop gone mainstream
    05:03 - DolphinGemma
    05:39 - Public Mood
    07:34 - AI Enlisted
    08:30 - GPT-5
    11:05 - Open Weight not out
    13:00 - METR Breakout
    17:30 - VASA-1
    18:28 - Lateral Productivity
    20:15 - 1 or 1000 benchmarks needed?
    24:54 - Continual Learning + Altman on Superintelligence
    28:08 - Automated Information Discovery ft AlphaEvolve


    Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809
    https://www.youtube.com/watch?v=PqVbypvxDto

    Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
    Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837

    DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09

    Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/

    METR Time Horizon: https://arxiv.org/pdf/2503.14499
    https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
    Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443
    https://shash42.substack.com/p/how-to-game-the-metr-plot
    https://x.com/METR_Evals/status/2002203627377574113

    GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems

    https://simple-bench.com/

    AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k
    https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan

    Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1

    Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169

    OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1
    Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ

    AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259

    Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/

    AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
    https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content=
    Continual Learning: https://abehrouz.github.io/files/NL.pdf

    Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

    GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989

    Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/

    Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines
    Turing Test: https://x.com/tunguz/status/1907185471211422147

    Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/

    LLM Brainrot: https://arxiv.org/pdf/2510.13928

    Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report

    Emotional Quotient: https://arxiv.org/pdf/2511.08394

    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/


    AI Insiders ($9!): https://www.patreon.com/AIExplained

    続きを読む 一部表示
    33 分
  • Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …
    2025/12/19

    The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more…

    https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26

    Also, do check out my new app: https://lmcouncil.ai

    Chapters:
    00:00 - Introduction
    00:50 - Results
    02:44 - But… the Flaw
    04:49 - So Benchmarks are fake? No
    07:37 - Spatial Reasoning + Hassabis
    10:06 - Proto-AGI
    12:07 - Minimal AGI
    15:07 - Compute Slowdown
    17:56 - New Data Paradigm

    Gemini 3 Flash: https://deepmind.google/models/gemini/flash/

    Hassabis Interview: https://www.youtube.com/watch?v=PqVbypvxDto
    Legg Interview: https://www.youtube.com/watch?v=l3u_FAv33G0
    Pre-training Lead Interview: https://www.youtube.com/watch?v=cNGDAqFXvew
    Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ
    Brockman Video: https://x.com/OpenAI/status/2001336514786017417
    Post-Training Reveal: https://x.com/OfficialLoganK/status/2001742530472534442

    Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
    Patreon Hallucinations Vid: https://www.patreon.com/posts/blockers-to-and-139264812
    AA-Omniscience Benchmark: https://artificialanalysis.ai/evaluations/omniscience
    https://arxiv.org/pdf/2511.13029


    lmcouncil.ai/benchmarks
    https://simple-bench.com/
    https://x.com/scaling01/status/1999620587744813205

    5.2 Codex Drop: https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf

    OpenAI Compute Trend: https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq

    Cramer Tweet/Response: https://x.com/BorisMPower/status/2001440650210976018

    OpenAI Valuation: ​​https://www.theinformation.com/articles/openai-discussed-raising-tens-billions-valuation-around-750-billion?rc=sy0ihq

    Indian Data: https://www.reuters.com/world/india/with-freebies-openai-google-vie-indian-users-training-data-2025-12-17/

    TheInformation Data: https://x.com/theinformation/status/2001421225751351778

    Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
    Sima 2: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
    Veo 3.1: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

    METR: https://metr.org/blohttps://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/2025-03-19-measuring-ai-ability-to-complete-long-tasks/


    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    続きを読む 一部表示
    20 分
  • GPT 5.2: OpenAI Strikes Back
    2025/12/12

    Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines.

    https://www.youtube.com/@eightythousandhours



    AI Insiders ($9!): https://www.patreon.com/AIExplained
    https://lmcouncil.ai

    Chapters:
    00:00 - Introduction
    00:55 - Better than Human @ Professional Tasks?
    04:42 - Test time Compute
    07:05 - Benchmark Selection
    09:32 - Simple Results + council comparison
    13:01 - Long Context
    13:52 - Self-Improvement
    15:00 - 10 Years + New Models

    Release Page: https://openai.com/index/introducing-gpt-5-2/

    GPT 5.2 Benchmark Comparison: https://www.reddit.com/r/singularity/comments/1pka1y9/gpt52_all_20_benchmarks_rankings_and_pricing/
    https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
    https://lmcouncil.ai/benchmarks

    Charxiv: https://charxiv.github.io/#leaderboard

    GDPval: https://arxiv.org/pdf/2510.04374
    My vid: https://www.youtube.com/watch?v=oK5LxMaROSA

    Kilpatrick: https://x.com/OfficialLoganK/status/1999270402712023158/photo/1

    Noam Brown: https://x.com/polynoamial/status/1999189845164667132

    New Model in New Year: https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains?rc=sy0ihq

    10 Years of OpenAI: https://openai.com/index/ten-years/

    GPQA: https://x.com/idavidrein/status/1841265634170278063

    ARC-AGI 1-2: https://arcprize.org/arc-agi/2/

    Sunday Robotics: https://x.com/tonyzzhao/status/1991204839578300813


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/


    https://lmcouncil.ai

    続きを読む 一部表示
    18 分
  • You Are Being Told Contradictory Things About AI: 8 examples
    2025/12/05

    With headlines of an imminent job apocalypse, code red for ChatGPT and recursive self-improvement, at the same time as Anthropic's CEO yesterday saying we know how to scale to AGI, and Gemini 3 DeepThink out today, it is easy to get lost among the narratives and counter-narratives. So here are both, plus the facts behind them, for you to decide.


    https://epoch.ai/data/data-centers

    Epoch AI is the sponsor of today’s video, and my views, and those expressed in this video, do not necessarily reflect Epoch AI’s views in any way.


    Chapters:
    00:00 - Introduction
    00:42 - Job Apocalypse?
    01:45 - Scaling to AGI
    04:15 - Recursive Self-Improvement Needed, or Not
    09:57 - OpenAI Code Red vs Gemini 3 DeepThink vs Claude Opus 4.5
    13:27 - DeepSeek Speciale vs Mistral Large v3
    16:45 - Claude Soul Document

    https://lmcouncil.ai/

    AI Insiders ($9!): https://www.patreon.com/AIExplained



    Guardian Interview: https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-artificial-intelligence-train-itself

    MIT Study on Jobs/Tasks: https://iceberg.mit.edu/report.pdf
    vs https://www.cnbc.com/2025/11/26/mit-study-finds-ai-can-already-replace-11point7percent-of-us-workforce.html

    Amodei on Scaling: https://www.youtube.com/watch?v=FEj7wAjwQIk
    Claude Soul Document: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document

    Capabilities Original Stance: https://www.anthropic.com/news/core-views-on-ai-safety

    Ilya Interview: https://www.dwarkesh.com/p/ilya-sutskever-2

    Ricursive Intelligence: https://x.com/RicursiveAI/status/1995932204703346946

    Economist Worker Usage of GenAI: https://www.economist.com/finance-and-economics/2025/11/26/investors-expect-ai-use-to-soar-thats-not-happening#selection-1409.94-1413.42

    Mistral v3 Large: https://docs.mistral.ai/models/mistral-large-3-25-12

    Compute Slowdown Paper: https://joel-becker.com/images/publications/forecasting_time_horizon_under_compute_slowdown.pdf
    https://x.com/joel_bkr/status/1993023436541903155

    METR Chart: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

    https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq

    OpenAI Code Red: https://www.anthropic.com/news/core-views-on-ai-safety
    Rocket Company: https://www.independent.co.uk/news/world/americas/sam-altman-rocket-elon-musk-spacex-b2878351.html

    DeepSeek Paper: https://arxiv.org/html/2512.02556v1

    DeepSeek Crowdstrike CCP: https://www.crowdstrike.com/en-us/blog/crowdstrike-researchers-identify-hidden-vulnerabilities-ai-coded-software/

    https://simple-bench.com/

    Patreon Post: https://www.patreon.com/c/aiexplained/posts

    Robot: https://x.com/jloganolson/status/1985850115379351799

    続きを読む 一部表示
    20 分
  • Gemini 3 is Here: 11 Details You Might Have Missed
    2025/11/19

    Gemini 3 Pro is out, and records fell like snowflakes in Svalbard.

    No long description, chapters or links today, huge technical difficulties, including with audio, so just want to publish asap.


    https://app.grayswan.ai/ai-explained


    https://lmcouncil.ai
    AI Insiders ($9!): https://www.patreon.com/AIExplained



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/
    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    続きを読む 一部表示
    22 分
  • Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that
    2025/11/14

    A lot just got released in the last 36 hours, and it will all affect hundreds of millions of people. 10 details you would miss if you just read the headlines, from GPT 5.1 regressions, to how Claude hacked Govt Agencies, to SIMA 2, and Musical Turing Tests.

    https://assemblyai.com/aiexplained

    Chapters:
    00:00 - Introduction

    00:56 - GPT 5.1 Smarter?

    01:47 - Some Regressions

    03:22 - Sycophancy?

    05:22 - Claude Auto-Hacking

    06:16 - Jailbreaking through Granularity

    08:22 - This Will be Re-used

    09:30 - Hallucinating Hacker

    09:57 - Surprisingly Neutral Tone

    12:18 - SIMA 2

    14:10 - Alpha Parallels

    17:24 - AI Music



    GPT 5.1 Announcement: https://openai.com/index/gpt-5-1/

    System Card: https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdf

    Benchmarks: https://openai.com/index/gpt-5-1-for-developers/

    Simple Bench: https://lmcouncil.ai/benchmarks


    Auto-Hacking: https://x.com/AnthropicAI/status/1989033793190277618

    https://www.anthropic.com/news/disrupting-AI-espionage

    Report: https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf



    Sima 2 Announcement: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

    https://x.com/amoufarek/status/1988986075331858693

    Scepticism: https://www.technologyreview.com/2025/11/13/1127921/google-deepmind-is-using-gemini-to-train-agents-inside-goat-simulator-3/

    Voyager: https://voyager.minedojo.org/


    Reuters Music: https://www.reuters.com/legal/litigation/are-you-listening-bots-survey-shows-ai-music-is-virtually-undetectable-2025-11-12/


    続きを読む 一部表示
    18 分
  • Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)
    2025/11/10

    Don’t let headlines about bubbles distract you from the real avenues of progress being explored in AI every week, including what had been thought to be a long-term blocker - continual learning (learning on the fly).

    https://app.grayswan.ai/ai-explained

    This, plus models introspecting (hesitate before you berate), Nano Banana 2 possibly spotted, Chinese imagen and more.

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    01:26 - Continual Learning (Nested Learning / HOPE)
    07:00 - Introspection
    10:54 - Image-Gen Progress

    Nested Learning Post: https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/

    Nested Learning Paper: https://abehrouz.github.io/files/NL.pdf

    Original Titans Paper: https://arxiv.org/pdf/2501.00663

    Siri News: https://www.bloomberg.com/news/articles/2025-11-05/apple-plans-to-use-1-2-trillion-parameter-google-gemini-model-to-power-new-siri

    Introspection: https://www.anthropic.com/research/introspection

    Full Paper: https://transformer-circuits.pub/2025/introspection/index.html#mechanisms

    Earlier Work: https://www.anthropic.com/research/mapping-mind-language-model
    https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

    Release Post: https://x.com/AnthropicAI/status/1983584136972677319

    https://lmcouncil.ai



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    続きを読む 一部表示
    13 分
  • Sora 2 - It will only get more realistic from here
    2025/10/01

    Sora 2 - the start of the infinite slop-feed or a key step to a generalist agent? Better than VEO 3 or over-hyped? I bring out 6 details you may have missed, contrast the announcement to Periodic Labs and even squeeze in some Claude Sonnet 4.5 analysis. Maybe I should make my videos longer…

    https://80000hours.org/aiexplained

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:40 - Two models?
    01:15 - Rollout Details
    01:43 - Versus Sora 1 / Veo 3
    04:30 - Sora App / Social Media
    06:40 - Masterplan
    09:30 - Generalist Agent? Periodic Labs
    12:05 - Claude Sonnet 4.5
    13:42 - Future Outlook

    Announcement: https://openai.com/index/sora-2/
    Launch Video: https://www.youtube.com/live/gzneGhpXwjU
    System Card: https://cdn.openai.com/pdf/50d5973c-c4ff-4c2d-986f-c72b5d0ff069/sora_2_system_card.pdf
    Sam Altman Blog Post on Sora App: https://blog.samaltman.com/sora-2

    Most Intelligent Claim: https://x.com/willdepue/status/1973089331284681110
    GTA: https://x.com/AndrewCurran_/status/1973298436536766666

    Meta Vibes: https://x.com/alexandr_wang/status/1971295156411433228?s=46

    Altman on Regulations: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman
    OpenAI Profit: https://www.theinformation.com/articles/openais-first-half-results-4-3-billion-sales-2-5-billion-cash-burn?rc=sy0ihq

    Periodic Labs: https://periodic.com/
    https://www.nytimes.com/2025/09/30/technology/ai-meta-google-openai-periodic.html
    https://x.com/LiamFedus/status/1973055380193431965
    https://baincapitalventures.com/insight/we-must-know-we-will-know/?s=09

    Sonnet 4.5: https://www.anthropic.com/news/claude-sonnet-4-5
    https://simple-bench.com/


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    続きを読む 一部表示
    16 分