The root cause fallacy

The root cause fallacy

After an incident has occurred, most investigations will try to find the root cause on a system or management level. To find the underlying reasons for an incident, instead of focusing on the more superficial ones.

An important reason for doing this is the assumption that by fixing a single root cause, we're not only fixing the incident at hand, but also many related incidents. This gets us the maximum bang for our buck. But is this a correct assumption?

If something sounds easy, it's generally incomplete. Root causes are no exception. What single thing can you fix that will have far reaching effects? Maintenance? Leadership? Culture? Whatever comes to mind will likely fall apart into many small, albeit related, things. For instance, fixing a maintenance management system likely includes fixing a lot of specific maintenance procedures, educating individuals, redesigning work environments etc.

At best, root causes are a way to group a list of specific issues.

Why can't we identify just one specific issue to fix on a system level that will prevent multiple incidents? Because preventing different incidents is done by fixing many specific issues, not just one.

Does this mean we have to abandon our search for root causes? Probably not. It's still good to look for improvements on a system level to avoid symptomatic fixes, but there are two things that come to mind. First, finding a single root cause to fix is more work than it sounds as it breaks apart into many specific tasks. Second, in some cases, it might be better to spend all that effort at the sharp end, fixing specific things there, instead of that 'one root cause' far away. Because if that one cause doesn't exist, it becomes possible that we get the biggest bang for our buck, not by fixing a lot of specific system issues, but a lot of specific operational issues instead.

Image by Tor Lindstrand