Incident Response Essentials: From Postmortems to Communication Strategies - DevOps 212

Adventures in DevOps - Een podcast door Charles M Wood - Donderdagen

Podcast artwork

In today's episode, Warren, Will, and special guest Falit Jain dive deep into the intricate world of incident management and response, drawing from rich experiences at tech giants like Amazon and Disney. They explore real-life scenarios, including Amazon's complex debugging challenges with over 150 engineers maintaining their detail page, and the high stakes of live streaming events at Disney.\Join them as they discuss the crucial aspects of effective incident response, from the importance of familiarity with systems and the role of on-call processes to the value of communication and meticulous postmortems. They also deep-dive into cultural influences from leadership, the balance between new feature launches and system stability, and the significance of metrics like mean time to resolution and error budgets.SocialsLinkedIn: Falit JainPicksWarren - Radical FocusWill - The Sacred Mushroom and the CrossBecome a supporter of this podcast: https://www.spreaker.com/podcast/adventures-in-devops--6102036/support.

Visit the podcast's native language site