エピソード

  • A Practical Guide to Observability in Enterprise Systems | Episode 30
    2025/12/08

    When engineering teams talk about observability, they often picture dashboards, alerts, and vendor slides. But inside real enterprise systems, observability is a story about people. It’s about how they communicate, how they respond under pressure, and how they collaborate when platforms are messy, duplicated, or half-maintained.


    In this episode, Duncan Mapes and Jason Ehmke sit down with platform veteran Jason McMunn, who has spent years untangling observability chaos across large organizations. What unfolds is a candid look at what actually breaks when systems scale, fractured ownership, unclear contracts between teams, and the silent cost of tools nobody fully uses.


    Through real incidents, leadership lessons, and platform consolidation stories, the trio walks through what it looks like to build observability that teams trust, not just observability that vendors promise. If you’ve ever shipped an alerting strategy that blew up in your face, wrestled with tool sprawl, or tried to rebuild trust between teams after an outage, this is the guide you’ve been needing.


    Top Takeaways:

    • Understanding the current capabilities is crucial for transformation.
    • Building relationships is key to gaining buy-in from stakeholders.
    • Empathy plays a significant role in technology management.
    • Transforming team roles requires a shift in mindset.
    • Cost management should focus on value rather than just savings.
    • Education and support empower teams to be self-sufficient.
    • Being present during incidents provides valuable insights.
    • User experiences can reveal underlying issues with technology.
    • Leadership is essential in driving organizational change.
    • Modeling best practices can inspire others to follow suit.


    Connect with us:

    Duncan Mapes

    Jason Ehmke

    DevGrid.io

    DevGrid on LinkedIn

    DevGrid on X

    続きを読む 一部表示
    45 分
  • Is AI the Developer’s New Co-Pilot or Competitor? | Episode 29
    2025/12/01

    Viewed through the lens of systems thinking, AI introduces both leverage and fragility into the development lifecycle. In this episode, Duncan Mapes, Jason Ehmke, and returning guest, Chris Boyd break down how AI affects feedback loops, failure modes, team throughput, and the architecture of modern systems.

    They explore the evolving responsibilities of engineers in an environment where code generation is partially automated, and discuss how AI reshapes design principles, mobile development approaches, and cross-team dynamics.

    The takeaway: AI is neither a panacea nor a threat. It’s a force multiplier for teams who know how to use it, and a risk amplifier for those who don’t.

    Top Takeaways:

    • AI tools are revolutionizing coding workflows, allowing for rapid prototyping and iteration.
    • The CLI tools like Claude and Codex are becoming essential for developers.
    • The last 10% of a project is often the hardest, but AI can help streamline this process.
    • Design and usability remain critical, even as coding becomes more automated.
    • The economics of development are shifting as AI reduces the time and cost of building software.
    • Open-source models are gaining traction, but proprietary models still dominate the market.
    • AI is not just a replacement for developers but a tool for enhancing their capabilities.
    • The future of mobile development may see a resurgence of native apps due to AI tools.
    • Companies need to adapt their workflows to integrate AI effectively.
    • The competition between AI models is intensifying, with new players entering the market.


    Connect with us:

    Duncan Mapes

    Jason Ehmke

    DevGrid.io

    DevGrid on LinkedIn

    DevGrid on X

    続きを読む 一部表示
    58 分
  • Engineering Leaders vs Tech Debt: A Realistic Conversation | Episode 28
    2025/11/27

    Tech debt exists at the intersection of engineering, business incentives, and system architecture. In complex organizations, it becomes a multidimensional problem involving operational risk, system reliability, long-term scalability, and developer productivity.

    In this analytically grounded episode, Duncan and Jason dissect tech debt through the lens of system thinking.

    They introduce a working model for categorizing tech debt into functional, structural, and data-related risk, explaining how each impacts throughput, incident frequency, and time-to-recovery. They also examine how vulnerabilities and poor data contracts masquerade as “bugs” but are often symptoms of deeper architectural debt.

    The conversation presents a practical playbook for leaders: how to assess tech debt, measure its economic impact, define acceptable thresholds, and integrate it into strategic planning.


    Top Takeaways:

    • Tech debt can be defined in various ways depending on context.
    • Shortcuts taken to meet business needs contribute to tech debt.
    • Tech debt is not just about code quality but also about business outcomes.
    • Standards change over time, leading to new tech debt.
    • Quantifying tech debt is essential for effective management.
    • Managing tech debt requires strategic planning and documentation.
    • Business leaders need to understand the implications of tech debt.
    • Justifying tech debt investments is a common challenge.
    • Effective communication with business partners is crucial for tech debt management.
    • A structured approach to documenting tech debt can aid in prioritization.


    Connect with us:

    Duncan Mapes

    Jason Ehmke

    DevGrid.io

    DevGrid on LinkedIn

    DevGrid on X

    続きを読む 一部表示
    39 分
  • Platform Engineering Playbook: Autonomy + Standards | Episode 27
    2025/11/17

    Most engineering failures can be traced back to two weak points: unclear standards or excessive autonomy. This episode presents a structured examination of how platform engineering resolves this tension to create resilient systems.

    Duncan and Jason break down the cause-and-effect chains behind incidents: data inconsistencies, missing resiliency patterns, queue backlogs, or unplanned API dependencies. They argue that resilient systems emerge from predictable inputs like, consistent data contracts, reliable backfill strategies, upstream validation, and well-defined ownership boundaries.

    This episode provides a mental model for designing platforms where autonomy accelerates delivery, while standards protect system health. It highlights the value of pre-incident thinking and gives a blueprint for building platforms that remain operable even under load, failure, or organizational drift.


    Top Takeaways:

    • Automation is essential for effective platform engineering.
    • Balancing enablement and independence is crucial for user adoption.
    • Self-service capabilities enhance scalability and efficiency.
    • Evangelizing the platform can drive user engagement and adoption.
    • Creating friction can reduce unnecessary support requests.
    • Manual reviews should be a temporary solution, not a permanent process.
    • Setting clear standards and guidelines is vital for platform integrity.
    • Building a culture of support fosters better relationships with users.
    • Understanding user needs helps in creating effective platform solutions.
    • Empowering key individuals can enhance platform adoption across teams.


    Connect with us:

    Duncan Mapes

    Jason Ehmke

    DevGrid.io

    DevGrid on LinkedIn

    DevGrid on X

    続きを読む 一部表示
    36 分
  • How to Build Resilient Systems in Complex Enterprises | Episode 26
    2025/11/10

    When systems fail, it’s rarely because no one saw it coming. It’s because no one planned for it.

    In this episode, Duncan Mapes and Jason Ehmke share real-world lessons from years of building and scaling technology across enterprise environments where downtime costs dollars.

    They explore the art of designing resilient systems that can withstand inevitable failure points, recover quickly, and continue operating under pressure. From team culture to proactive design checklists, this conversation dives into how engineering leaders can turn system reliability into a competitive advantage.

    Top Takeaways:

    • Designing for failure is crucial in system architecture.
    • Understanding failure points is essential for system resiliency.
    • Resiliency can mean different things depending on the context.
    • Asking the right questions during project kickoff is vital.
    • Complex enterprise environments have unique challenges.
    • Responsibility for failures should not be shifted to others.
    • Handling app stability is a core responsibility of developers.
    • Everything in a system can fail at some point.
    • Evaluating business impact is crucial for prioritizing resiliency efforts.
    • Creating a resiliency checklist can guide design and implementation.


    Connect with us:

    Duncan Mapes

    Jason Ehmke

    DevGrid.io

    DevGrid on LinkedIn

    DevGrid on X

    続きを読む 一部表示
    35 分
  • Stop Wasting Money on Conferences — Do This Instead | Episode 25
    2025/11/03

    Every year, companies spend thousands sending their engineers to conferences.

    Flights. Hotels. Per diems. All in the hope of “learning” and “networking.” But when they return? The notebooks gather dust, and the insights never leave their laptops.

    In this episode of Tech Council, Duncan Mapes and Jason Ehmke pull back the curtain on what actually makes conferences worth attending. From understanding why you’re going, to choosing who should go, to how that knowledge is shared afterward, they uncover the strategies that separate teams who grow from teams who just take selfies at the expo hall.

    They also explore local meetups to internal learning sessions that often yield more value for less cost.

    If you’ve ever wondered whether conferences are worth the budget line item, this episode gives you the framework to find out.


    Top Takeaways:

    • Conferences can provide valuable learning opportunities for engineers.
    • Networking at conferences often yields more insights than formal sessions.
    • Local meetups can be just as beneficial as large conferences.
    • Budgeting for conferences requires careful consideration of ROI.
    • Selecting attendees for conferences can be a challenging process.
    • Knowledge transfer after conferences is crucial for team growth.
    • Encouraging participation in local events fosters community engagement.
    • Engineers should aspire to share knowledge through talks at meetups.
    • The tech industry is constantly evolving, making continuous learning essential.
    • Feedback from the community can enhance future discussions on conference attendance.


    Connect with us:

    Duncan Mapes

    Jason Ehmke

    DevGrid.io

    DevGrid on LinkedIn

    DevGrid on X

    続きを読む 一部表示
    25 分
  • Top Book Recommendations for Tech Professionals and Engineering Managers | Episode 24
    2025/10/20

    Every great leader has a few books that quietly shaped how they lead.

    For Duncan and Jason, those books go far beyond tech manuals. They’re the ones that teach you how to think, how to communicate, and how to take responsibility when things get hard.

    In this week’s Tech Council, they open their personal libraries and share the titles that stuck. Expect reflections on ownership, transparency, and the hidden leadership lessons inside biographies, history books, and even fiction.

    This isn’t your ordinary book recommendations episode. It’s about why some ideas endure, why others fail in practice, and why the best tech lessons sometimes come from outside of tech entirely.


    Books Mentioned in this Episode:

    • The Scaling Era: An Oral History of AI
    • Extreme Ownership: How U.S. Navy SEALs Lead and Win
    • The Dichotomy of Leadership
    • Radical Candor
    • Turn the Ship Around
    • PHP/MySQL Programming for the Absolute Beginner
    • The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
    • John Adams by David McCullough
    • 1776 by David McCullough
    • The Daily Stoic: 366 Meditations on Wisdom, Perseverance, and the Art of Living
    • Good to Great: A Study of Management Strategies of Companies with Lasting Growth
    • High Output Management by Andy Grove
    • 12 Rules for Life: An Antidote to Chaos
    • The Hard Thing About Hard Things
    • What You Do Is Who You Are: How to Create Your Business Culture
    • Rework by Jason Fried and DHH
    • High Growth Handbook
    • Three-Body Problem
    • The Alchemist

    Connect with us:

    Duncan Mapes

    Jason Ehmke

    DevGrid.io

    DevGrid on LinkedIn

    DevGrid on X

    続きを読む 一部表示
    35 分
  • Habits That Make or Break Hybrid Teams | Episode 23
    2025/10/13

    The hybrid workplace has become the defining experiment of modern engineering, and most leaders are still figuring it out.

    In this Tech Council episode, Duncan Mapes and Jason Ehmke analyze the habits that drive high-performing hybrid teams. They discuss proximity bias, performance evaluation, and how communication systems shape fairness and culture.

    They dissect the bad habits that derail trust (like reactive communication and lack of structure) and the good ones that strengthen it (like clear metrics, empathy, and visibility).

    Whether you’re leading across time zones or transitioning from in-office to hybrid, this episode offers actionable insights drawn from years of managing distributed engineering teams.


    Top Takeaways:

    • Hybrid work can create feelings of exclusion among remote employees.
    • Intentional communication is crucial in remote settings.
    • Establishing clear norms helps teams function effectively.
    • Performance management in hybrid environments requires careful consideration.
    • Promoting remote employees can be challenging due to visibility issues.
    • Building relationships is essential for team cohesion.
    • Junior employees need guidance to develop professional skills remotely.
    • Regular check-ins can help maintain team engagement.
    • Documentation of achievements is vital for remote employees.
    • Creating opportunities for in-person interactions can enhance team dynamics.


    Connect with us:

    Duncan Mapes

    Jason Ehmke

    DevGrid.io

    DevGrid on LinkedIn

    DevGrid on X

    続きを読む 一部表示
    47 分