エピソード

  • When AI Becomes Your SRE: How Incident.io Is Automating Incident Response
    2025/11/06
    When your site goes down, every second counts. For years, Incident.io has helped engineering teams coordinate through chaos—getting the right people in the room, keeping stakeholders informed, and restoring order fast. Now, they’re building something new: an AI SRE that can actually help diagnose and respond to incidents. In this episode, Teresa Torres talks with Lawrence Jones (Founding Engineer) and Ed Dean (Product Lead for AI) about how their team is teaching AI to think like a site reliability engineer. They share how they went from simple prototypes that summarized incidents to a multi-agent system that forms hypotheses, tests them, and even drafts fixes—all from within Slack. You’ll hear how they: - Identify which parts of debugging can safely be automated - Combine retrieval, tagging, and re-ranking to find relevant context fast - Use post-incident “time travel” evals to measure how well their AI performed - Balance human trust and AI confidence inside high-stakes workflows This is a masterclass in designing AI systems that think, reason, and collaborate like expert teammates.
    続きを読む 一部表示
    1 時間 8 分
  • Building Trainline’s AI Travel Assistant: How a 25-Year-Old Company Went Agentic
    2025/10/30
    Trainline—the world’s leading rail and coach platform—helps millions of travelers get from point A to point B. Now, they’re using AI to make every step of the journey smoother. In this episode, Teresa Torres talks with David Eason (Principal Product Manager) Billie Bradley (Product Manager), and Matt Farrelly (Head of AI and Machine Learning) from Trainline about how they built Travel Assistant, an AI-powered travel companion that helps customers navigate disruptions, find real-time answers, and travel with confidence. They share how they: - Identified underserved traveler needs beyond ticketing - Built a fully agentic system from day one, combining orchestration, tools, and reasoning loops - Designed layered guardrails for safety, grounding, and human handoff - Expanded from 450 to 700,000 curated pages of information for retrieval - Developed LLM-as-judge evals and a custom user context simulator to measure quality in real-time - Balanced latency, UX, and reliability to make AI assistance feel trustworthy on the go It’s a behind-the-scenes look at how an established company is embracing new AI architectures to serve customers at scale.
    続きを読む 一部表示
    1 時間 9 分
  • Powering Government with Community Voices: How ZenCity Built an AI That Listens
    2025/10/23
    How do you use AI to help city leaders truly hear their residents? In this episode, Teresa Torres talks with Noa Reikhav (SVP of Product), Andrew Therriault (VP of Data Science), and Shota Papiashvili (SVP of R&D) from Zencity, a company that powers government decision-making with community voices. They share how Zencity brings together survey data, 311 calls, social media, and local news into a unified platform that helps cities understand what people care about—and act on it. You’ll hear how the team built their AI assistant and workflow engine by being thoughtful about their data layers, how they combined deterministic systems with LLM-driven synthesis, and how they keep accuracy and trust at the core of every AI decision. It’s a fascinating look at how modern AI infrastructure can turn noisy, messy civic data into clear, actionable insight.
    続きを読む 一部表示
    1 時間 8 分
  • Building AI Coworkers: How Neople Is Making Agents Work Where You Work
    2025/10/16
    What if your next teammate was an AI coworker — one that could answer support tickets, process invoices, or even draft your next email — and your _non-technical_ colleagues could teach it how to do those tasks themselves? In this episode, host Teresa Torres talks with Seyna Diop (CPO), Job Nijenhuis (CTO & Co-founder), and Christos C. (Lead Design Engineer) of Neople, a company creating “digital coworkers” that blend the reliability of automation with the empathy and flexibility of AI. They share how Neople evolved from simple response suggestions to fully autonomous customer service agents, the architecture that powers their conversational workflow builder, and how they designed eval loops that include their _customers_ as part of the quality process. You’ll learn how the team: - Moved from “LLMs will solve everything” to finding the right balance between code, agents, and guardrails - Designed evals that run in production to detect hallucinations before an email ever reaches a customer - Helped non-technical users build automations conversationally — and taught them decomposition along the way - Turned customers’ feedback loops into eval pipelines that improve product quality over time It’s a fascinating look at how one startup is rethinking what it means to “work with AI” — not as a tool, but as a teammate.
    続きを読む 一部表示
    1 時間 12 分
  • Building Alyx: How Arize AI Dogfooded Its Way to an Agentic Future
    2025/10/09
    What does it really take to build an AI agent inside an AI platform—especially when you’re using that same platform to build the agent? In this episode of Just Now Possible, Teresa Torres talks with SallyAnn DeLucia (Director of Product at Arize) and Jack Zhou (Staff Engineer at Arize) about the journey of building Alyx, their AI agent designed to help teams debug, optimize, and evaluate AI applications. They share the scrappy beginnings—Jupyter notebooks, hacked-together web apps, and weekly dogfooding sessions with their customer success team—and the hard-earned lessons about evals, tool design, and how to prioritize early skills. Along the way, you’ll hear how cross-functional experience, intuition-building, and customer insight shaped Alyx into a product that’s now central to the Arize platform. If you’ve ever wondered how to move from vibe checks and one-off prototypes to systematic improvement in your AI product, this episode is for you.
    続きを読む 一部表示
    49 分
  • Debugging AI Products: From Data Leakage to Evals with Hamel Husain
    2025/10/02
    How do you know if your AI product is actually any good? Hamel Husain has been answering that question for over 25 years. As a former machine learning engineer and data scientist at Airbnb and GitHub (where he worked on research that paved the way for GitHub Copilot), Hamel has spent his career helping teams debug, measure, and systematically improve complex systems. In this episode, Hamel joins Teresa Torres to break down the craft of error analysis and evaluation for AI products. Together, they trace his journey from forecasting guest lifetime value at Airbnb to consulting with startups like Nurture Boss, an AI-native assistant for apartment complexes. Along the way, they dive into: - Why debugging AI starts with thinking like a scientist - How data leakage undermines models (and how to spot it) - Using synthetic data to stress-test failure modes - When to rely on code-based assertions vs. LLM-as-judge evals - Why your CI/CD set should always include broken cases - How to prioritize failure modes without drowning in them Whether you’re a product manager, engineer, or designer, this conversation offers practical, grounded strategies for making your AI features more reliable—and for staying sane while you do it.
    続きを読む 一部表示
    1 時間 27 分
  • Inside eSpark’s AI Teacher Assistant: RAG, Evals, and Real Classroom Needs
    2025/09/25
    How do you build an AI-powered assistant that teachers will actually use? In this episode of Just Now Possible, Teresa Torres talks with Thom van der Doef (Principal Product Designer), Mary Gurley (Director of Learning Design & Product Manager), and Ray Lyons (VP of Product & Engineering) from eSpark. Together, they’ve spent more than a decade building adaptive learning tools for K–5 classrooms—and recently launched an AI-powered Teacher Assistant that helps educators align eSpark’s supplemental lessons with district-mandated core curricula. We dig into the real story behind this feature: - How post-COVID shifts in education created new pressures for teachers and administrators - Why their first instinct—a chatbot interface—failed in testing, and what design finally worked - The technical challenges of building their first RAG system and learning to wrangle embeddings - How their background in education shaped a surprisingly rigorous eval process, long before “evals” became a buzzword - What they’ve learned from thousands of teachers using the product this school year It’s a detailed look at the messy, iterative process of building AI-powered products in the real world—straight from the team doing the work.
    続きを読む 一部表示
    1 時間 9 分
  • Ellen Brandenberger @ Stack Overflow
    2025/09/18
    When ChatGPT launched, Stack Overflow faced a cataclysmic shift: developer behavior was changing overnight. In this episode, Teresa Torres talks with Ellen Brandenburger, former product leader at Stack Overflow, about how her team navigated the disruption, prototyped AI features, and eventually built an entirely new business line. Ellen shares the inside story of Overflow AI—from the first scrappy prototypes of conversational search, through multiple iterations with semantic search and RAG, to the tough decision to roll the product back when it couldn’t meet developer standards. She also explains how Stack Overflow turned a looming threat into opportunity by creating technical benchmarks and licensing its Q&A corpus to AI labs. This episode offers a rare look at what it really takes to adapt when a platform-defining shift hits—and what product managers, designers, and engineers can learn about prototyping, evaluating quality, and building in uncertainty.
    続きを読む 一部表示
    1 時間 8 分