An often-claimed "fact" is that operators or maintenance workers cause 70–90% of accidents. It is certainly true that operators are blamed for 70–90%. Are we limiting what we learn from accident investigations by limiting the scope of the inquiry? By applying systems thinking to process safety, we may enhance what we learn from accidents and incidents and, in the long run, prevent more of them.
Systems thinking is an approach to problem solving that suggests the behavior of a system's components only can be understood by examining the context in which that behavior occurs. Viewing operator behavior in isolation from the surrounding system prevents full understanding of why an accident occurred — and thus the opportunity to learn from it.
We do not want to depend upon simply learning from the past to improve safety. Yet learning as much as possible from adverse events is an important tool in the safety engineering tool kit. Unfortunately, too narrow a perspective in accident and incident investigation often destroys the opportunity to improve and learn. At times, some causes are identified but not recorded because of filtering and subjectivity in accident reports, frequently for reasons involving organizational politics. In other cases, the fault lies in our approach to pinpointing causes, including root cause seduction and oversimplification, focusing on blame, and hindsight bias.
ROOT CAUSE SEDUCTION AND OVERSIMPLIFICATION
Assuming that accidents have a root cause gives us an illusion of control. Usually the investigation focuses on operator error or technical failures, while ignoring flawed management decision-making, safety culture problems, regulatory deficiencies, and so on. In most major accidents, all these factors contribute; so to prevent accidents in the future requires all to be identified and addressed. Management and systemic causal factors, for example, pressures to increase productivity, are perhaps the most important to fix in terms of preventing future accidents — but these are also the most likely to be left out of accident reports.
As a result, many companies find themselves playing a sophisticated "whack-a-mole" game: They fix symptoms without fixing the process that led to those symptoms. For example, an accident report might identify a bad valve design as the cause, and, so, might suggest replacing that valve and perhaps all the others with a similar design. However, there is no investigation of what flaws in the engineering or acquisition process led to the bad design getting through the design and review processes. Without fixing the process flaws, it is simply a matter of time before those process flaws lead to another incident. Because the symptoms differ and the accident investigation never went beyond the obvious symptoms of the deeper problems, no real improvement is made. The plant then finds itself in continual fire-fighting mode.
A similar argument can be made for the common label of "operator error." Traditionally operator error is viewed as the primary cause of accidents. The obvious solution then is to do something about the operator(s) involved: admonish, fire or retrain them. Alternatively, something may be done about operators in general, perhaps by rigidifying their work (in ways that are bound to be impractical and thus not followed) or marginalizing them further from the process they are controlling by putting in more automation. This approach usually does not have long-lasting results and often just changes the errors made rather than eliminating or reducing errors in general.
Systems thinking considers human error to be a symptom, not a cause. All human behavior is affected by the context in which it occurs. To understand and do something about such error, we must look at the system in which people work, for example, the design of the equipment, the usefulness of procedures, and the existence of goal conflicts and production pressures. In fact, one could claim that human error is a symptom of a system that needs to be redesigned. However, instead of changing the system, we try to change the people — an approach doomed to failure.
For example, accidents often have precursors that are not adequately reported in the official error-reporting system. After the loss, the investigation report recommends that operators get additional training in using the reporting system and that the need to always report problems be emphasized. Nobody looks at why the operators did not use the system. Often, it is because the system is difficult to use, the reports go into a black hole and seemingly are ignored (or at least the person writing the report gets no feedback it even has been read, let alone acted upon), and the fastest and easiest way to handle a detected potential problem is to try to deal with it directly or to ignore it, assuming it was a one-time occurrence. Without fixing the error-reporting system itself, not much headway is made by retraining the operators in how to use it, particularly where they know how to use it but ignored it for other reasons.
Another common human error cited in investigation reports is that the operators did not follow the written procedures. Operators often do not follow procedures for very good reasons. An effective type of industrial action for operators who are not allowed to strike, like air traffic controllers in the U.S., is to follow the procedures to the letter. This type of job action can bring the system down to its knees.