エピソード

  • Episode 64: Data Science Meets Agentic AI with Michael Kennedy (Talk Python)
    2025/12/03
    We have been sold a story of complexity. Michael Kennedy (Talk Python) argues we can escape this by relentlessly focusing on the problem at hand, reducing costs by orders of magnitude in software, data, and AI. In this episode, Michael joins Hugo to dig into the practical side of running Python systems at scale. They connect these ideas to the data science workflow, exploring which software engineering practices allow AI teams to ship faster and with more confidence. They also detail how to deploy systems without unnecessary complexity and how Agentic AI is fundamentally reshaping development workflows. We talk through: - Escaping complexity hell to reduce costs and gain autonomy - The specific software practices, like the "Docker Barrier", that matter most for data scientists - How to replace complex cloud services with a simple, robust $30/month stack - The shift from writing code to "systems thinking" in the age of Agentic AI - How to manage the people-pleasing psychology of AI agents to prevent broken code - Why struggle is still essential for learning, even when AI can do the work for you LINKS Talk Python In Production, the Book! (https://talkpython.fm/books/python-in-production) Just Enough Python for Data Scientists Course (https://training.talkpython.fm/courses/just-enough-python-for-data-scientists) Agentic AI Programming for Python Course (https://training.talkpython.fm/courses/agentic-ai-programming-for-python) Talk Python To Me (https://talkpython.fm/) and a recent episode with Hugo as guest: Building Data Science with Foundation LLM Models (https://talkpython.fm/episodes/show/526/building-data-science-with-foundation-llm-models) Python Bytes podcast (https://pythonbytes.fm/) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtube.com/live/jfSRxxO3aRo?feature=share) Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (35% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav
    続きを読む 一部表示
    1 時間 3 分
  • Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind)
    2025/11/22
    Gemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely. Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops. They talk through: - The implications of models that can "self-heal" and fix their own code - The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems. - Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews - Why Needle in a Haystack benchmarks often fail to predict real-world performance - How to build agent harnesses that turn model capabilities into product velocity - The shift from measuring latency to managing time-to-compute for reasoning tasks LINKS From Context Engineering to AI Agent Harnesses: The New Software Discipline, a podcast Hugo did with Lance Martin, LangChain (https://high-signal.delphina.ai/episode/context-engineering-to-ai-agent-harnesses-the-new-software-discipline) Context Rot: How Increasing Input Tokens Impacts LLM Performance (https://research.trychroma.com/context-rot) Effective context engineering for AI agents by Anthropic (https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtu.be/CloimQsQuJM) Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav
    続きを読む 一部表示
    1 時間
  • Episode 1: Introducing Vanishing Gradients
    2022/02/16
    In this brief introduction, Hugo introduces the rationale behind launching a new data science podcast and gets excited about his upcoming guests: Jeremy Howard, Rachael Tatman, and Heather Nolis! Original music, bleeps, and blops by local Sydney legend PlaneFace (https://planeface.bandcamp.com/album/fishing-from-an-asteroid)!
    続きを読む 一部表示
    6 分
  • Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs
    2025/10/31
    Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution. In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy. They talk through: How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription. The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents. The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice. Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products. The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip. LINKS Randy on LinkedIn (https://www.zenml.io/llmops-database) Wyrd Studios (https://thewyrdstudios.com/) Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc) 🎓 Learn more: In Hugo's course: Building AI Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us!
    続きを読む 一部表示
    59 分
  • Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production
    2025/10/16
    Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI. Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products. We talk through: - Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos - The essential MLOps hygiene (tracing and continuous evals) that most teams skip - The optimal (and very low) limit for the number of tools an agent can reliably use - How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains - The principle of using simple Python/RegEx before resorting to costly LLM judges LINKS The LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K! (https://www.zenml.io/llmops-database) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc) 🎓 Learn more: -This was a guest Q&A from Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us!
    続きを読む 一部表示
    28 分
  • Episode 60: 10 Things I Hate About AI Evals with Hamel Husain
    2025/09/30
    Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems. Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust. We talk through: The 10(+1) critical mistakes that cause teams to waste time on evals Why "hallucination scores" are a waste of time (and what to measure instead) The manual review process that finds major issues in hours, not weeks A step-by-step method for building LLM judges you can actually trust How to use domain experts without getting stuck in endless review committees Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agents If you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap. LINKS Hamel's website and blog (https://hamel.dev/) Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise (https://vanishinggradients.fireside.fm/51) Hamel Husain on Lenny's pocast, which includes a live demo of error analysis (https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill) The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era (https://vanishinggradients.fireside.fm/9) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtube.com/live/QEk-XwrkqhI?feature=share) Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
    続きを読む 一部表示
    1 時間 13 分
  • Episode 59: Patterns and Anti-Patterns For Building with AI
    2025/09/23
    John Berryman (Arcturus Labs; early GitHub Copilot engineer; co-author of Relevant Search and Prompt Engineering for LLMs) has spent years figuring out what makes AI applications actually work in production. In this episode, he shares the “seven deadly sins” of LLM development — and the practical fixes that keep projects from stalling. From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle. We talk through: - Why chasing perfect accuracy is a dead end - How to use agents without losing control - Context engineering: fitting the right information in the window - Starting simple instead of over-orchestrating - Separating retrieval from generation in RAG - Splitting complex extractions into smaller checks - Knowing when frameworks help — and when they slow you down A practical guide to avoiding the common traps of LLM development and building systems that actually hold up in production. LINKS: Context Engineering for AI Agents, a free, upcoming lightning lesson from John and Hugo (https://maven.com/p/4485aa/context-engineering-for-ai-agents) The Hidden Simplicity of GenAI Systems, a previous lightning lesson from John and Hugo (https://maven.com/p/a8195d/the-hidden-simplicity-of-gen-ai-systems) Roaming RAG – RAG without the Vector Database, by John (https://arcturus-labs.com/blog/2024/11/21/roaming-rag--rag-without-the-vector-database/) Cut the Chit-Chat with Artifacts, by John (https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/) Prompt Engineering for LLMs by John and Albert Ziegler (https://amzn.to/4gChsFf) Relevant Search by John and Doug Turnbull (https://amzn.to/3TXmDHk) Arcturus Labs (https://arcturus-labs.com/) Watch the podcast on YouTube (https://youtu.be/mKTQGKIUq8M) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) 🎓 Learn more: Hugo's course (this episode was a guest Q&A from the course): Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
    続きを読む 一部表示
    48 分
  • Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)
    2025/09/09
    While most conversations about generative AI focus on chatbots, Thomas Wiecki (PyMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy. Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable. We talk through: Using LLMs as “synthetic consumers” to simulate surveys and test product ideas How Bayesian modeling and causal graphs enable transparent, trustworthy decision-making Building closed-loop systems where AI generates and critiques ideas Guardrails for multi-agent workflows in marketing mix modeling Where generative AI breaks (and how to detect failure modes) The balance between useful models and “correct” models If you’ve ever wondered how to move from flashy prototypes to AI systems that actually inform business strategy, this episode shows what it takes. LINKS: The AI MMM Agent, An AI-Powered Shortcut to Bayesian Marketing Mix Insights (https://www.pymc-labs.com/blog-posts/the-ai-mmm-agent) AI-Powered Decision Making Under Uncertainty Workshop w/ Allen Downey & Chris Fonnesbeck (PyMC Labs) (https://youtube.com/live/2Auc57lxgeU) The Podcast livestream on YouTube (https://youtube.com/live/so4AzEbgSjw?feature=share) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
    続きを読む 一部表示
    1 時間 1 分