エピソード

  • Signal Drop: The Cave You Won't Instrument
    2026/06/12

    I avoided building one dashboard for about a year. The cost-per-request board for a service I already suspected was showing a bad number. Every deferral had a reason. Every reason was a way of not looking.

    Joseph Campbell: The cave you fear to enter holds the treasure you seek. Every team has a cave, the part of the system nobody instruments because they're afraid of what it'll say. This episode is about treating that avoidance as a signal rather than a priority call.

    The habit: name your cave, and put one crude panel on it this week.


    The book is out. Metrics & Mayhem: A CTO's Guide to Observability That Actually Works. Kindle is live now; paperback and hardback launched on 1 June.

    Get your free chapter here:https://www.masteringobservability.com/metrics-and-mayhem/free-chapter

    Newsletter: https://masteringobservability.com

    LinkedIn: https://www.linkedin.com/in/allanmann1/


    続きを読む 一部表示
    7 分
  • Signal Drop: The Alert That Just Says "We Need To Talk"
    2026/06/05

    A duty engineer showed me the page that woke her at 3 a.m. It said: Error rate elevated. Three words and a graph. ThenShe lay there guessing which service, how bad, the big one or the noisy one.

    That gap is the problem. A context-free alert is an open loop, and open loops fill with fear. This episode is about treating an alert as what it actually is: a message to a tired human at theworst hour of their day. What broke, how bad, what to do

    The book is out. Metrics & Mayhem: A CTO's Guide to Observability That Actually Works. Kindle is live now

    Get your free chapter here:https://www.masteringobservability.com/metrics-and-mayhem/free-chapter

    Newsletter:https://masteringobservability.com

    LinkedIn:https://www.linkedin.com/in/allanmann1/

    続きを読む 一部表示
    7 分
  • Signal Drop: Position Before the Page
    2026/05/29

    Most observability work is reactive, not preventive.

    This Signal Drop is about the most expensive habit in IT operations: treating the response to the incident as thestrategy.

    It covers why positioning beats reaction, what the unbuilt position actually costs you, and the merge-time habit that lets you stop paying the heroism tax. One idea, one habit, five minutes.


    続きを読む 一部表示
    8 分
  • Signal Drop: Progress Isn't Linear
    2026/05/22

    Your reliability metrics don't climb in a straight line, and most of that month-to-month movement is noise wearing the costume of signal.

    This episode is about why one bad month isn't a failed strategy, and how chasing every wobble quietly wrecks the thing you're trying to fix.

    The habit to take away: decide what counts as real before you see the number, not after.

    続きを読む 一部表示
    6 分
  • Signal Drop: The Line You Won't Cross
    2026/05/15

    Returning after two months of silence. An ended contract in Abu Dhabi, a war that began a week later, and the integrity decision behind the gap.

    This Signal Drop is about the line you refuse to sell, the small increments of compromise that erode an IT Ops career, and how to name yours before someone else moves it for you.

    続きを読む 一部表示
    10 分
  • Deep Dive: The Midnight Pager Is Dying
    2026/02/26

    Deep Dive (28mins): Auto-remediation is replacing the 3 AM scramble. Gartner says 60% of large enterprises will adopt self-healing infrastructure by 2026. PagerDuty'searly adopters are resolving incidents 50% faster. But the DORA 2024 report found AI tooling correlates with worse delivery performance, for the second year running, and the Catchpoint SRE Report showed toil rising, not falling.

    What happens to on-call when the easy pages disappear, and the remaining 20% of incidents just get harder? Four concrete actions for the next sprint.
    Based on the blog post: https://masteringobservability.com/p/the-midnight-pager-is-dying-what-replaces-it-is-harder

    Observability, SRE, AIOps, On-Call, Auto-Remediation, IT Operations, Leadership

    続きを読む 一部表示
    26 分
  • Signal Drop: Your Role Changes Every Hour
    2026/02/18

    Most leaders don't fail because they're bad at leading. They fail because they stay in one mode too long. This Signal Drop breaks incident leadership into four practical modes: Direct, Shield, Coach, Delegate. Pick the wrong one and the room fills with noise. Pick the right one, and people can think. Includes a ten-second habit you can use before your next call.


    続きを読む 一部表示
    5 分
  • Signal Drop: Accountability Is the Job
    2026/02/12

    In IT Operations, accountability isn’t blame. It’s ownership. When nobody owns the outcome, decisions wobble, incidents drag, and teams default to theatre. This Signal Drop is about making ownership explicit under pressure: one owner, one outcome, one next check. Blameless doesn’t mean ownerless. Clear accountability creates calm, faster decisions, and better reliability.

    続きを読む 一部表示
    5 分