『The VOID』のカバーアート

The VOID

The VOID

著者: Courtney Nash
無料で聴く

このコンテンツについて

The VOID makes public software-related incident reports available to everyone, raising awareness and increasing understanding of software-based failures in order to make the internet a more resilient and safe place. This podcast is an insider's look at software-related incident reports. Each episode, we pull an incident report from the VOID (https://www.thevoid.community/), and invite the author(s) on to discuss their experience both with the incident itself, and the also the process of analyzing and writing it up for others to lean from.

© 2025 The VOID
エピソード
  • Canva and the Thundering Herd
    2025/05/14

    Greetings fellow incident nerds, and welcome to Season 2 of The VOID podcast. The main new thing for this new season is we’re now available in video—so if you’re listening to this and prefer watching me make odd faces and nod a lot, you can find us here on YouTube.

    The other new thing is we now have sponsors! These folks help make this podcast possible, but they don’t have any say over who joins us or what we talk about, so fear not.

    This episode’s sponsor is Uptime Labs. Uptime Labs is a pioneering platform specializing in immersive incident response training. Their solution helps technical teams build confidence and expertise through realistic simulations that mirror real-world outages and security incidents. When most of investment these days in the incident space goes to technology and process, Uptime Labs focuses on sharpening the human element of incident response.

    In this episode, we talk to Simon Newton, Head of Platforms at Canva, about their first public incident report. It’s not their first incident by any means, but it’s the first time they chose as a company to invest in sharing the details of an incident with the rest of us, which of course we’re big fans of here at the VOID.

    We discuss:

    • What led to Canva finally deciding to publish a public incident report
    • What the size and nature of their incident response looks like (this incident involved around 20 different people!)
    • Their progression from a handful of engineers handling incidents to having a dedicated Incident Command (IC) role
    • Avoiding blame when a known performance fix was ready to be deployed but hadn't yet, which contributed to the incident getting worse as it progressed
    • The various ways the people involved in the incident collaborated and improvised to resolve it


    続きを読む 一部表示
    37 分
  • Episode 8: A Tale of A Near Miss
    2025/02/28

    On this episode of the VOID podcast, I’m joined by Nick Travaglini, who is a Technical Customer Success Manager at Honeycomb. Nick wrote up a near miss that his team tackled towards the end of 2023, and I’ve been really wanting to discuss a near miss incident report for a very long time. What’s a Near Miss you might ask, or how is that an incident, or is it? What IS an incident? Keep listening, because we’re going to get into those questions, along with discussing whether or not it’s a good idea to say nasty things about other companies in your incident reports.

    Related Resources

    • Preempting Problems in a Sociotechnical System (the incident report)
    • Work as Imagined vs Work as Done
    • Resilience in Software Foundation
    • On the Mode of Existence of Technical Objects
    • Hitting the Brakes
    • 2024 VOID Report

    続きを読む 一部表示
    36 分
  • Episode 7: When Uptime Met Downtime
    2025/01/30

    We took a bit of a hiatus from recording last year, but we're back with an episode that I think everyone is really going to enjoy. Late last year, John Allspaw told me about this new company called Uptime Labs. They simulate software incidents, giving people a safe and constructive environment in which to experience incidents, practice what response is like, and bring what they learn back to their own organizations.

    For the record, this is not a sponsored podcast. I legitimately just love what they do. And I had the sincere privilege to meet Uptime's cofounder and CEO, Hamed Silatani at SRECon EMEA in November, where he gave a fantastic talk about some of the things they've learned about incident response for running hundreds of simulations for their customers.

    They recently had their first serious outage of their own platform. And so Hamed is joined by Joe McEvitt, cofounder and director of engineering at Uptime to discuss with me the one time that Uptime met downtime.

    続きを読む 一部表示
    52 分

The VOIDに寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。