A question we get frequently from engineering leaders is: "How can I tell that we're improving or making progress on learning from incidents?"

For the most part, folks understand that decreasing frequency of incidents, or the duration of them, isn't evidence the org is learning.
Instead of looking for the *absence* of incidents, look for the *presence* of behaviors as signals that progress on learning from incidents is happening, such as:
📈 More people will decide to attend post-incident review meetings. Meeting attendance will grow. Engineers will report that they learn things about their systems there (and in the incident analysis write-ups that result) that they can’t anywhere else.
📈 Post-incident review meeting attendance will include people from engineering and customer support not directly involved in the incident under discussion.
📈Engineers will actively seek focused incident analysis training. They will express interest in topics related to accident investigation and read more on these topics on their own time.
📈Tools that aid incident analysis and post-incident review meeting preparation, or enrich the post-incident artifacts will appear and be refined.
📈The number of “orphan” post-incident “action items” (in JIRA or other task-tracking systems) will trend downward. Orphan items will be “adopted” by being reviewed and cross-referenced to incidents and post-incident analysis write-ups.
📈Post-incident analysis document content will become *richer* (e.g. include diagrams drawn by participants in post-incident review meetings, the actual transcripts/quotes of the incident response and handling, contributions/quotes from customer support staff).
📈The number of unique readers of post-incident analysis write-ups will grow over time. Even months after the analysis is published there will be new views of the document(s).
📈Comments, replies, highlights, tags, and other metadata regarding the content of write-ups will come from an ever-broader audience and spark new dialogue between readers.
📈Incident analysis documents will be used in new-hire onboarding or training as vehicles to describe in rich detail the histories of involved technologies, the challenges and risks faced by teams, and configuration of systems and dependencies.
📈Engineering teams will use incident analysis documents as primary training materials.
📈Explicit references to specific incident analysis documents will appear more frequently in company internal documents.
📈Citations of specific incidents in project/product “roadmap” documents, “runbooks”, hiring plans, new systems design proposals, etc., are evidence that the authors understand both the value and the relevance of experience with incidents.
📈Incident analysis documents originating in engineering groups will routinely be read/reviewed by those in *other* groups (such as customer support). Comments from these groups will be included and cross-referenced in the post-incident documents.
📈Post-incident documents originating in other groups (such as customer support) will routinely be reviewed by engineering groups.
You can follow @AdaptiveCLabs.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.