No doubt everyone who has a distributed control system (DCS) has encountered alarm management issues. the reason is simple: A DCS makes over-alarming all too easy.
the DCS arrived in a marketplace that had enforced restrictions on alarms due to physical space limitations. To do anything with raw alarms was almost impossible; the only features found in the electro-mechanical alarm annunciator box were first-up alarm indication and the ability to suppress the alarm by removing the electronics from the box.
the DCS provided flexibility, allowing almost unlimited alarms, multiple alarm types, including bad-process-variable monitoring, and a host of options, from various filtering techniques to relational dynamic alarm rationalization. Unfortunately, the DCS came without any discipline or cost implications for adding alarms. To be fair to DCS manufacturers, they recognized the potential for over-alarming. Even the very first version of the Honeywell DCS, for instance, gave advice about configuring and prioritizing alarms that is not too different from current guidelines. However, no one read it or implemented this guidance.
the problem was compounded by lack of leadership and ownership, an issue that still exists at many sites. In the past, after process and equipment engineers specified alarms, the control engineers installing the DCS added alarms according to what could be done, not what should be done. Operators demanded alarms based on ease of monitoring, trying to make up for deficiencies in the human computer interface (HCI) and the loss of the big picture that occurred when the panel was replaced by a 15-in. keyhole window to the process.
Meanwhile, multidiscipline “process hazard assessment” teams conducting hazard and operability analyses added alarms to deal with deficiencies in the design. As a result, a process plant might well go from 150 physical alarms to 14,000 DCS alarms (Figure).
Management has been reluctant to pay for redesign of the alarm-management system and the HCI, feeling they had already paid for the design once and couldn’t justify paying again. However, management knows that bad design has impacted operators’ performance, even to the point of demanding extra staffing to deal with the flood of alarms that occur during every disturbance. This has caused minor incidents because of operators missing critical information or making errors due to stress and overwork. Unfortunately, it often takes a big incident (such as the explosion and fires at the Texaco refinery in Milford Haven, Wales, which is quoted in all the alarm-management guidelines) to force companies to readdress the issue.
How do we resolve this problem? the answer requires careful review of the causes we have just outlined and starts with clear ownership of the problem. Some sites have given a single person responsibility to manage alarms, while other sites have made a multidiscipline team responsible.
the key to success is to establish responsibility. Success should be based on performance, not just on the number of alarms eliminated but also on the effects — the improvement to operators’ jobs and the impact on the running of the plant.
Any problem that involves costs, people and other resources clearly calls for project management. However, because of the lack of ownership and accountability and the absence of performance expectations, alarm-management projects rarely start with formal project management. Not surprisingly, a lot of these projects fail due to poor understanding of the scope of the problem, lack of resources and money, loss of momentum and no identifiable return on investment.
Many managers get frustrated with engineers because the engineers don’t define the problem and the real cost implications to do the project correctly. the engineers attack the problem without a plan and wonder why they don’t get the support of the organization to address all the issues that surface.
Often, a frustrated manager starts an alarm-management project by engaging a control engineer. the manager thinks the problem is limited to control system configuration and that a few sit-down discussions with the operators will resolve it — an approach similar to that of other control system problems. the control engineer soon finds that some operators will not part with any alarms, even though the alarm system hampers their efforts during a disturbance.
the control engineer then involves a process engineer. However, they both are overwhelmed by the size and complexity of the problem. the control engineer reads recent articles about the subject and discovers that they need statistical tools to better understand what is happening within the alarm system.
So, they get trial copies of some software tools to assess the frequency of alarms and find that just 12 alarms produced 53% of the activations in the system, and one particular alarm caused 123 alarm activations in a four-hour period. they decided to focus on the “top 10” bad actor list, and put the remaining two on the list of the next 10.
the two engineers discovered that fixing just 10 of those 12 alarms was a challenging and time-consuming task. Some alarms required physical instrumentation modification. Some needed configuration changes that would demand a better understanding of why the alarm existed, what its limits should be and when the alarm is not useful and should be suppressed. Some of the alarms were just not necessary. Others required the alarm priority to be changed, which in the context of the thousands of alarms, still would have little effect.
This little exercise demonstrated to the engineers that to solve this alarm-management problem would require a multidiscipline team, with some members permanently on the team and others, such as rotating-equipment and programmable-logic-controller (PLC) specialists and other subject matter experts (SMEs), part-time as required. the experts would add knowledge of equipment, safety and environmental issues or process technology, which are the main reasons alarms exist. the software tool would prove essential throughout the project and the life-cycle of the alarm system; buying it would mean justifying a capital authorization.