Resilience engineering makes the following seemingly contradictory claims:

1. Small incidents don’t provide insight into big ones.
2. To get insight into the nature of big incidents, study the small ones.

How can that be?

Here comes a thread.
. @sidneydekkercom puts it like this: “Incidents do not precede accidents. Normal work does.”

This has implications for both points.

(Terminology may be a little confusing here because what software people call “incidents” is what safety people call “accidents”.)
For the first point, the idea is that the historical distribution of small incidents doesn’t give you any insight into how likely the next big one is. Small incidents provide insight into how likely other small incidents are, but not how likely the *big* incidents are.
For the second point, since *big* incidents are preceded by “normal work”, to understand how big incidents happen, you need to understand the nature of “normal work”.

OK, so what does that have to do with small incidents?
In theory, you can study “normal work” even if a small incident hasn’t happened. However, small incidents (and near misses, aka OOPSies) present us opportunities for studying normal work. We can ask questions about the specific events to give us insight.
Small incidents and near misses illuminate about the nature of normal work like demands (what’s hard), constraints (what limits do people face), and workarounds (what is the unexpected stuff people have to do to get their work done).
To sum up: aggregating small incidents doesn’t tell you anything about how likely a big incident is. But studying individual small incidents in detail can give you insight into the nature of the system, and help illuminate hidden risks.

/fin
You can follow @lhochstein.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.