• Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
    2026/05/21
    In this episode, Jure Leskovec, co-founder and chief scientist at Kumo and professor of computer science at Stanford, joins us to explore two fronts of his work: AI for science and relational deep learning. We begin with AI Virtual Cell, a multiscale effort to learn data-driven representations from proteins to cells to patients using single-cell RNA-seq data, protein language models like ESM, and structure models like AlphaFold—without hand-encoding biology. Jure then dives into relational deep learning, reframing enterprise databases as graphs and training neural networks directly on raw multi-table data. He explains Kumo’s Relational Foundation Model (RFM2), which performs in-context learning over subgraphs to make accurate predictions on new databases and tasks with no training, and how this approach benchmarks against RelBench and other multi-table datasets. We also discuss real-world deployments at companies like Reddit, DoorDash, and Coinbase, explainability via attention over tables and columns, integration with agentic systems, deployment options, and practical limitations. The complete show notes for this episode can be found at https://twimlai.com/go/768.
    続きを読む 一部表示
    1 時間 6 分
  • How to Find the Agent Failures Your Evals Miss with Scott Clark - #767
    2026/05/07
    In this episode, Scott Clark, co-founder and CEO of Distributional, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We dig into examples of real-world failures Scott’s team has seen in production systems, such as “lazy” tool-use hallucinations that standard evals miss, and how mapping traces into vector fingerprints enables clustering and topic discovery to uncover emergent behaviors. Scott explains how analytics can feed the data flywheel by generating evals, guardrails, and training data, and why online, adaptive approaches are essential for non-stationary models. We also touch on practical how-to’s such as instrumentation with OpenTelemetry, the GenAI semantic conventions, and the role of dedicated analytics tools. The complete show notes for this episode can be found at https://twimlai.com/go/767.
    続きを読む 一部表示
    53 分
  • How to Engineer AI Inference Systems with Philip Kiely - #766
    2026/04/30
    In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-production can move in hours, not months, and why understanding “the knobs” of inference—batching, quantization, speculation, and KV cache reuse—lets teams design better products and SLAs. We trace the inference maturity journey from closed APIs to dedicated deployments and in-house platforms, discuss GPU lifecycles, and survey today’s runtime landscape, including vLLM, SGLang, and TensorRT LLM. Finally, we look ahead to agents and multimodality, making the case for specialized, workload-specific runtimes when performance and efficiency matter most. The complete show notes for this episode can be found at https://twimlai.com/go/766.
    続きを読む 一部表示
    55 分
  • How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765
    2026/04/16
    In this episode, Rashmi Shetty, senior director of enterprise generative AI platform at Capital One, joins us to explore how the company is designing, deploying, and scaling multi-agent systems in a highly regulated environment. Rashmi walks us through Chat Concierge, a multi-agent chat experience for auto dealerships that handles intent disambiguation, tool invocation, and human handoffs to deliver safer, more personalized customer journeys. We discuss Capital One’s platform-centric approach to AI agents and how it separates design from runtime governance, embedding policies, guardrails, and cyber controls across agent threat boundaries. Rashmi shares how the team approaches the developer experience for agent builders, observability, and evals for stochastic, multi-agent workflows; and strategies for model specialization, including fine-tuning and distillation. We also cover standards and abstraction, closed-loop learning from production telemetry, and key lessons for enterprises building agentic systems. The complete show notes for this episode can be found at https://twimlai.com/go/765.
    続きを読む 一部表示
    54 分
  • The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764
    2026/03/26
    Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregressive LLMs. Stefano introduces Mercury 2, a commercial-scale diffusion LLM that can generate multiple tokens simultaneously and achieve inference speeds 5-10x faster than small frontier models, paving the way for latency-sensitive applications like voice interactions and fast agentic loops. We also cover the open research challenges in diffusion LLM training, serving infrastructure requirements, and post-training for diffusion-based systems. Finally, Stefano shares his perspective on whether diffusion models can rival or surpass autoregressive LLMs at scale, the advantages for highly controllable generation, and what the future of multimodal diffusion models might look like. The complete show notes for this episode can be found at https://twimlai.com/go/764.
    続きを読む 一部表示
    1 時間 3 分
  • Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763
    2026/03/10
    In this episode, Sid Pardeshi, co-founder and CTO of Blitzy, joins us to discuss building autonomous development systems able to deliver production-ready software at enterprise scale. Sid contrasts AI-assisted coding with end-to-end autonomy, arguing that “code is a commodity” and acceptance is the real metric—security, standards, tests, and maintainability included. We explore Blitzy’s hybrid graph-plus-vector approach, which grounds agents and combines semantic signals with keyword search to navigate large repositories efficiently. Sid breaks down context and agent engineering, how effective context windows have plateaued, and why dynamic agent personas, tool selection, and model-specific prompting matter at scale. He details their orchestration of large swarms of AI agents to collaboratively analyze codebases, plan tasks, and execute complex tasks in parallel. We also dig into why Agents.md and flat memories break down, storing feedback in the knowledge graph, and building real-world evals beyond leaderboards to choose the right model for each task. The complete show notes for this episode can be found at https://twimlai.com/go/763.
    続きを読む 一部表示
    1 時間 16 分
  • AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762
    2026/02/26
    In this episode, Sebastian Raschka, independent LLM researcher and author, joins us to break down how the LLM landscape has changed over the past year and what is likely to matter most in 2026. We discuss the shift from raw model scaling to reasoning-focused post-training, inference-time techniques, and better tool integration. Sebastian explains why methods like self-consistency, self-refinement, and verifiable-reward reinforcement learning have become central to progress in domains like math and coding, and where those approaches still fall short. We also explore agentic workflows in practice, including where multi-agent systems add real value and where reliability constraints still dominate system design. The conversation covers architecture trends such as mixture-of-experts, attention efficiency strategies, and the practical impact of long-context models, alongside persistent challenges like continual learning. We close with Sebastian’s perspective on maintaining strong coding fundamentals in the age of AI assistants and a preview of his new book, Build A Reasoning Model (From Scratch). The complete show notes for this episode can be found at https://twimlai.com/go/762.
    続きを読む 一部表示
    1 時間 19 分
  • The Evolution of Reasoning in Small Language Models with Yejin Choi - #761
    2026/01/29
    Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University in the Computer Science Department and the Institute for Human-Centered AI (HAI). In this conversation, we explore Yejin’s recent work on making small language models reason more effectively. We discuss how high-quality, diverse data plays a central role in closing the intelligence gap between small and large models, and how combining synthetic data generation, imitation learning, and reinforcement learning can unlock stronger reasoning capabilities in smaller models. Yejin explains the risks of homogeneity in model outputs and mode collapse highlighted in her “Artificial Hivemind” paper, and its impacts on human creativity and knowledge. We also discuss her team's novel approaches, including reinforcement learning as a pre-training objective, where models are incentivized to “think” before predicting the next token, and "Prismatic Synthesis," a gradient-based method for generating diverse synthetic math data while filtering overrepresented examples. Additionally, we cover the societal implications of AI and the concept of pluralistic alignment—ensuring AI reflects the diverse norms and values of humanity. Finally, Yejin shares her mission to democratize AI beyond large organizations and offers her predictions for the coming year. The complete show notes for this episode can be found at https://twimlai.com/go/761.
    続きを読む 一部表示
    1 時間 6 分