エピソード

  • Episode 022: The Ethics of LLMs that Few Talk About
    2025/08/09

    Conversations around AI ethics often focus on a suite of incredibly important topics such as data security and privacy, model bias, model transparency, and explainability. However, each time we use large AI models (e.g., diffusion models, LLMs), we reinforce a host of additional potentially unethical practices that are needed to build and maintain these systems.

    In this episode, Jake and David discuss some of these unsavory topics, such as human labor costs and environmental impact. Although it's a bit of a downer, it's crucial for each of us to acknowledge how our behavior impacts the larger ecosystem and recognize our role in perpetuating these practices.

    続きを読む 一部表示
    1 時間 11 分
  • Episode 021: Explainable AI and LLMs
    2025/08/03

    "Explainable AI", aka XAI, refers to a suite of techniques to help AI system developers and AI system users understand why inputs to the system resulted in the observed outputs.

    Industries such as healthcare, education, and finance require that any system using mathematical models or algorithms to influence the lives of others is transparent and explainable.

    In this episode, Jake and David review what XAI is, classical techniques in XAI, and the burgeoning area of XAI techniques specific to LLM-driven systems.

    続きを読む 一部表示
    1 時間 13 分
  • Episode 020: Evidence-Based Practices for Prompt Engineering
    2025/07/20

    Prompt engineering involves a lot more than simply getting smarter with how you structure the prompts you enter in an LLM browser interface.

    Furthermore, a growing body of peer-reviewed research provides us with best practices to improve the accuracy and reliability of LLM outputs for the specific tasks we build systems around.

    In this episode, Jake and David review evidence-based best practices for prompt engineering and, importantly, highlight what proper prompt engineering requires such that most of us likely cannot call ourselves prompt engineers.

    続きを読む 一部表示
    1 時間 8 分
  • Episode 019: LLM Evaluation Frameworks
    2025/07/06

    Lots of people like to talk about the importance of prompts, context, and what is sent to an LLM. Few discuss the even more important aspect of an LLM-driven system in evaluating its output.

    In this episode, we discuss traditional and modern metrics used to evaluate LLM outputs. And, we review the common frameworks for obtaining that feedback.

    Though evals are a lot of work (and easy to do poorly), those building (or buying) LLM-driven systems should be transparent about their process and the current state of their eval framework.

    続きを読む 一部表示
    1 時間 28 分
  • Episode 018: Data Privacy and Security Considerations When Working with LLMs
    2025/06/29

    Jake and David chat about best practices and considerations for those building and using AI systems that leverage LLMs.

    続きを読む 一部表示
    1 時間 12 分
  • Episode 017 - Demystifying how GenAI Works
    2025/06/22

    Jake and David chat about types of GenAI, and specifically how LLMs work—from input text or audio through the output you read.

    続きを読む 一部表示
    1 時間 1 分
  • Episode 016: What would you trust an LLM with?
    2025/06/15

    Jake and I chat about current hot topics in the LLM space and what we would (and would not) trust an LLM with.

    続きを読む 一部表示
    1 時間 10 分
  • Episode 015: Welcome to the Era of Experience
    2025/05/04

    Jake and I chat about a forthcoming book chapter titled, "Welcome to the Era of Experience" by David Silver and Richard Sutton (link below). This—naturally—led other topics to surface, such as companies staffed entirely by AI agents (which turned out as well as that sounds); superintelligence (we might be legally required to reference this during the 2025 AI hype cycle); and how practical systems built on these ideas would even be architected (we both came in with different ideas here which was fun). Happy listening.


    Links to things mentioned:

    • Smith, D., & Sutton, R. S. (April 26, 2025). Welcome to the Era of Experience. Preprint of a chapter to appear in Designing an Intelligence. MIT Press.
    • Sutton, R. S., & Barto, A. G. (2015). Reinforcement Learning: An Introduction. The MIT Press.
    • Wilkins, J. (April 27, 2025). Professors Staffed a Fake Company Entirely With AI Agents, and You'll Never Guess What Happened. Who would have thought? [Friendly write-up of the agentic company work]
    • Xu, F. F., Song, Y., Li, B., Tang, Y., Jain, K., Bao, M., ..., & Neubig, G. (2024). TheAgentCompany: Benchmarking LLM agents on consequential real world tasks. arXiv:2412.14161. [Actual study.]
    続きを読む 一部表示
    56 分