エピソード

  • Principles of Evals: The Future of GenAI Evaluation (E.43)
    2026/05/29

    LLMs are optimized to sound convincing—not to know when they’re wrong. In this episode, Deanna Emery breaks down why hallucinations are fundamentally tied to how language models work, why confidence is often disconnected from correctness, and how better evaluation strategies can make AI systems more reliable in production. We also get into uncertainty, semantic reasoning, and what humans still do better than models.

    00:00 — Why LLMs hallucinate confidently
    09:00 — The limits of current eval systems
    18:00 — Why uncertainty matters in AI
    27:00 — Semantic reasoning vs memorization
    38:00 — What humans still do better than models

    The biggest risk in AI isn’t wrong answers. It’s wrong answers delivered with confidence.

    続きを読む 一部表示
    54 分
  • AI FDE at Databricks (E.42)
    2026/05/24

    Building a great AI team takes more than hiring smart people. In this episode, Brooke Wenig breaks down how Databricks built the AI FDE organization, why culture compounds faster than technical skill, and what separates high-trust engineering teams from teams that slowly degrade over time. We also get into mentoring, hiring in the age of AI coding tools, and why software engineering fundamentals matter more than ever.

    00:00 — How Databricks built the AI FDE team
    08:00 — AI cheating in technical interviews
    19:00 — Why culture degrades as teams scale
    31:00 — Building a team brand around specialists
    43:00 — What skills matter most in the AI era

    Great AI teams aren’t built through rules. They’re built through people who reinforce the right standards every day.

    続きを読む 一部表示
    45 分
  • AI for Coparenting: How AI can Deescalate Coparenting (E.41)
    2026/05/15

    Most AI startups optimize speed, automation, or revenue. Sol built one to stop people from emotionally destroying each other. After a brutal divorce and years trapped inside high-conflict co-parenting, he realized the real problem wasn’t logistics, it was emotional escalation through constant communication. BestInterest uses AI to filter manipulative, hostile, and triggering messages before they reach the other parent, turning AI into a psychological buffer instead of a chatbot. AI has the potential to bridge challenging social gaps. While there is a lot of fear around this capability, this case study for navigating challenging parenting relationships showcases how unbiased AI personas can mitigate these problems.

    Chapters

    • 00:00 The Origin Story of Best Interest
    • 06:06 AI as a Mediator in Co-Parenting
    • 12:57 Designing for Impersonal yet Supportive Nature
    • 18:00 The Best Interest of the Kids
    • 22:57 Fine-Tuning Communication Expectations
    • 28:10 Product Insight and Differentiation
    • 36:52 Passion-Driven Work and Meaningful Impact
    • 42:37 AI and Human Communication
    • 49:37 AI as an Engine of Peace
    続きを読む 一部表示
    49 分
  • How to Prevent Doomsday: Guardrails, Alignment, and Education (E.40)
    2026/05/09

    AI alignment breaks the moment we assume intelligence automatically produces morality. Dr. Peter R. Solomon argues the real danger isn’t sentient AI becoming evil, it’s AI inheriting no emotional history, no family structure, and no reason to value human survival.

    The conversation moves from CRISPR in high schools to AI-generated writing, autonomous agents, synthetic memory, and why “guardrails” fail when systems evolve faster than institutions can regulate them. The deeper point: humans trained AI to think, but not necessarily to care.

    00:00 Why science education kills curiosity
    06:00 The AI extinction scenario nobody wants to model
    15:45 Why static guardrails fail in production systems
    27:00 The AI-written paragraph that appeared unprompted
    39:50 AI as a cooperative intelligence, not a replacement

    The systems we’re building already shape human behavior. The question is whether they’ll eventually shape human survival.

    続きを読む 一部表示
    46 分
  • Research Spikes: Starting Simple to Drive Success (E.39)
    2026/04/24

    Most AI systems fail before they scale because teams build the “final architecture” too early. Research spikes exist to expose what you don’t understand, not to prove you’re right. The fastest path to a working system is starting with something intentionally simple, validating invariants, and throwing away what doesn’t hold.


    00:00 What a research spike actually is
    04:15 The real problem: context overload
    13:29 Why you should not build the “final system” first
    21:42 The cathedral vs farmhouse mistake
    33:42 When NOT to use advanced tech like graph RAG
    39:04 The unsexy work that actually matters


    If your first version feels impressive, you probably built the wrong thing.

    続きを読む 一部表示
    43 分
  • Controlling the Chaos: Creating Reliable LLM-Based Applications (E.38)
    2026/04/17

    LLMs don’t fail loudly, they drift into undefined behavior and take your system with them. The only way to build stable AI systems is to enforce contracts at every boundary, especially when dealing with non-deterministic outputs. Modern Python tools like Pydantic, enums, and structured interfaces aren’t optional, they’re how you turn probabilistic generation into reliable software.


    00:00 Why LLMs behave like “chaos goblins”
    03:38 What a contract actually enforces
    14:56 Real bug caused by missing validation
    26:32 Why external APIs will break your system
    44:06 The worst mistake: putting logic in prompts

    If you’re not validating every boundary, you’re not building software, you’re gambling.

    続きを読む 一部表示
    45 分
  • Builders vs. Posers: How to Provide Real Value (E. 37)
    2026/04/10

    AI has made it easier than ever to build fast, but also easier to fake progress. In this episode, we break down the difference between people who optimize for real value versus those who optimize for visibility, how this shows up in AI workflows, and why long-term thinking is the only way to build systems that actually matter.


    00:00 — Builders vs posers in tech
    05:00 — Why visibility can distort incentives
    12:00 — How AI enables shallow prototypes
    22:00 — How to identify real value
    35:00 — Building culture vs playing the game


    AI doesn’t change the game, it exposes how you’re playing it.

    続きを読む 一部表示
    49 分
  • Automating Execution: What skills still matter? (E. 36)
    2026/04/03

    AI coding agents can generate code faster than ever, but they also introduce new risks that most teams don’t understand yet. In this episode, we break down how software development is shifting from writing code to reviewing and directing AI systems, why poorly guided agents create fragile systems, and what skills actually matter as AI takes over execution.


    00:00 — AI writing code vs humans reviewing it
    06:00 — Why coding agents fail on complex systems
    15:00 — Tactical vs strategic use of AI in development
    25:00 — The role of humans as system designers and guardrails
    35:00 — What skills matter in an AI-driven workflow


    AI doesn’t remove engineers, it makes their judgment the most important part of the system.

    続きを読む 一部表示
    55 分