• #201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

  • 2025/03/05
  • 再生時間: 59 分
  • ポッドキャスト

#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

  • サマリー

  • Our 201st episode with a summary and discussion of last week's big AI news! Recorded on 03/02/2025

    Join our brand new Discord here! https://discord.gg/nTyezGSKwP

    Hosted by Andrey Kurenkov and guest host Sharon Zhou Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai

    Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

    In this episode:

    - The release of GPT-4.5 from OpenAI, Anthropic's Claude 3.7, and Grok 3 from XAI, comparing their features, costs, and capabilities. - Discussion on new tools and applications including Sesame's new voice assistant and Google's AI coding assistant, Gemini Code Assist, highlighting their unique benefits. - OpenAI's continued user growth despite competition, pricing models for Google's text-to-video platform, and HP acquiring and shutting down Humane's AI pin. - Insights into new research on alignment and specification gaming in LLMs, including papers on fine-tuning causing broad misalignment and Google's multi-agent system for scientific collaboration.

    Timestamps + Links:

    • (00:00:00) Intro / Banter
    • (00:01:36) News Preview
    • Tools & Apps
      • (00:02:33) OpenAI announces GPT-4.5, warns it’s not a frontier AI model
      • (00:07:22) Anthropic launches a new AI model that ‘thinks’ as long as you want
      • (00:11:14) New Grok 3 release tops LLM leaderboards
      • (00:16:43) Sesame is the first voice assistant I’ve ever wanted to talk to more than once
      • (00:18:30) Google launches a free AI coding assistant with very high usage caps
      • (00:20:45) Rabbit shows off the AI agent it should have launched with
      • (00:22:23) Mistral’s Le Chat tops 1M downloads in just 14 days
    • Applications & Business
      • (00:24:06) OpenAI Tops 400 Million Users Despite DeepSeek’s Emergence
      • (00:27:37) Google’s new AI video model Veo 2 will cost 50 cents per second
      • (00:29:52) HP is buying Humane and shutting down the AI Pin
    • Projects & Open Source
      • (00:31:44) Microsoft launches next-gen Phi AI models.
      • (00:33:47) OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work
      • (00:37:12) SWE-Bench+: Enhanced Coding Benchmark for LLMs
    • Research & Advancements
      • (00:40:00) Towards an AI co-scientist
      • (00:42:52) Magma: A Foundation Model for Multimodal AI Agents
    • Policy & Safety
      • (00:47:32) Demonstrating specification gaming in reasoning models
      • (00:51:03) Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
    続きを読む 一部表示

あらすじ・解説

Our 201st episode with a summary and discussion of last week's big AI news! Recorded on 03/02/2025

Join our brand new Discord here! https://discord.gg/nTyezGSKwP

Hosted by Andrey Kurenkov and guest host Sharon Zhou Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

In this episode:

- The release of GPT-4.5 from OpenAI, Anthropic's Claude 3.7, and Grok 3 from XAI, comparing their features, costs, and capabilities. - Discussion on new tools and applications including Sesame's new voice assistant and Google's AI coding assistant, Gemini Code Assist, highlighting their unique benefits. - OpenAI's continued user growth despite competition, pricing models for Google's text-to-video platform, and HP acquiring and shutting down Humane's AI pin. - Insights into new research on alignment and specification gaming in LLMs, including papers on fine-tuning causing broad misalignment and Google's multi-agent system for scientific collaboration.

Timestamps + Links:

  • (00:00:00) Intro / Banter
  • (00:01:36) News Preview
  • Tools & Apps
    • (00:02:33) OpenAI announces GPT-4.5, warns it’s not a frontier AI model
    • (00:07:22) Anthropic launches a new AI model that ‘thinks’ as long as you want
    • (00:11:14) New Grok 3 release tops LLM leaderboards
    • (00:16:43) Sesame is the first voice assistant I’ve ever wanted to talk to more than once
    • (00:18:30) Google launches a free AI coding assistant with very high usage caps
    • (00:20:45) Rabbit shows off the AI agent it should have launched with
    • (00:22:23) Mistral’s Le Chat tops 1M downloads in just 14 days
  • Applications & Business
    • (00:24:06) OpenAI Tops 400 Million Users Despite DeepSeek’s Emergence
    • (00:27:37) Google’s new AI video model Veo 2 will cost 50 cents per second
    • (00:29:52) HP is buying Humane and shutting down the AI Pin
  • Projects & Open Source
    • (00:31:44) Microsoft launches next-gen Phi AI models.
    • (00:33:47) OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work
    • (00:37:12) SWE-Bench+: Enhanced Coding Benchmark for LLMs
  • Research & Advancements
    • (00:40:00) Towards an AI co-scientist
    • (00:42:52) Magma: A Foundation Model for Multimodal AI Agents
  • Policy & Safety
    • (00:47:32) Demonstrating specification gaming in reasoning models
    • (00:51:03) Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4に寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。