Figure 2 - HMI Improvement: Right clicking on
operating graphic brings up menu with many
useful functions for dealing with an alarm.
The quality of graphics and the style of alarm presentation on graphics vary widely. Some sites primarily rely on the standard DCS vendor “alarm summary” and “groups” — without any detailed graphics. Other sites also utilize very effective alarm presentation and operator help integrated into detailed operating graphics. Figure 2 shows a right-click contextual menu including functionality to acknowledge the alarm and obtain operator help for it from a MAD. The potential for performance improvement by enhancing the HMI clearly depends on the quality of graphics (from an alarm presentation perspective) currently in use at a particular plant. There’s also the possibility of utilizing a graphic designed specifically to be effective during alarm floods. The ASM Consortium has recently tested a graphic that might serve to replace the traditional list-based alarm summary, a format that isn’t usually effective when large numbers of alarms occur. Tests showed that the new style has considerable potential for giving operators a better understanding of the true abnormal situation — thereby allowing them to act more effectively. Because floods are still the most significant alarms problem in many plants, this enhancement clearly has considerable potential for performance improvements. 4. Mode-based alarming.
Many plants have multiple operating modes (startup, normal running, shutdown, cold standby, regeneration, etc.). The alarm system is often only appropriate for normal running — for example, many standing alarms derive from plant equipment that’s not in service. Operators have to recognize as such the many inappropriate alarms activated during other modes of operation. This devalues the integrity and value of the alarm system. A better approach is to identify the various modes and define alarm parameter settings that suit each mode of operation. So, for instance, a standby or shutdown mode can be used to eliminate alarms from out-of-service equipment. If a MAD is being used, various sets of alarm parameters can be stored in the MAD and written to the DCS when a plant mode change occurs. The enforcement functionality can handle this activity — effectively overwriting old alarm parameters with the ones required for the new mode of operation. This type of enforcement is much more efficient than requiring operators to make large numbers of manual configuration parameters changes. Most sites currently don’t use mode-based alarming. So this means for improvement has considerable potential, although this clearly depends on the character of operations of the particular plant unit. 5. Alarm testing.
All alarms should have a real meaning and value to operators. It makes no sense if some alarms are out of service (due to one or more failures) and aren’t clearly recognized as being out of service. Standards such as IEC 61511 require regular proof testing of safety-related alarms. But most DCS alarms aren’t safety-related and most sites don’t routinely test these DCS alarms. Some DCS alarms occur often enough (or are of relatively low value) that there’s insufficient justification for testing them. But other alarms may have a relatively high value and remain inactive for long periods of time — thus posing concerns about whether they will activate when required. Testing such alarms to prove they are operational clearly has value. EEMUA 191 includes recommendations for testing alarms. It’s generally impractical to test all
alarms; so, it’s essential to identify and implement a realistic testing strategy based on the value of each alarm in terms of the potential consequences if it doesn’t activate when it should. For example, a simple strategy might call for annual testing of all higher-priority alarms that haven’t been activated in the previous year. If the EEMUA recommendations for the proportion of the higher-priority alarms in the alarm system have been followed, this would mean that only around 15% to 20% of all configured alarms would require testing — and a large proportion of those might have been activated during the previous year, significantly reducing the number needing testing. Given that most plants currently don’t systematically test alarms, there’s certainly some potential for performance improvement. 6. Alarm suppression technology.
Use of suppression has often been advocated as a means for improving poorly performing alarm systems — with little or no consideration of other potential methods. An event frequently quoted as a good opportunity for suppression is when large numbers of alarm activations occur when a compressor trips. Such events often generate 100 or more alarms in a short time. Apart from the first few alarms, most of these activations are of little or no value to operators — and are obvious candidates for suppression. To be effective, suppression of consequential alarms needs to be done quickly — often within a few seconds of the event causing the compressor trip. In other situations, e.g., when a pump changeover occurs, there’s also value in suppressing a relatively small number of alarms for a period of time. The pump changeover will often have been operator-initiated and timing will depend on pump run-down or run-up dynamics. This clearly differs markedly from the requirements of suppression during a compressor trip. There’ve been many attempts over the years to suppress or mask unwanted alarms. Many of these attempts have been costly and haven’t been very effective. In some cases the alarms being suppressed simply resulted from poor design; rationalization of those alarms often removes the need for suppression. A key requirement for suppression schemes is to identify a pattern of plant or control-system conditions that must exist before initiation. Any alarm suppression scheme demands careful design, to avoid potentially dangerous situations where alarms remain suppressed when they should be operational. One approach often used is to write custom code. While this offers maximum flexibility, it can also cause problems because the code can be difficult to test and maintain. A better approach is to use a standardized (and well-tested) suppression function based on tabulated suppression requirements. Tabulated data typically will need to include several different types of information: • Required conditions for initiation of suppression, e.g., plant mode, values of process variables, digital states and other alarm conditions. The term “permissives” is sometimes used for such conditions. It’s sensible to employ (where possible) multiple permissives to reduce chances of failed or noisy signals initiating suppression; • A set of alarms that need to be suppressed — and, in some cases, the order of suppression so that alarms expected to occur early during an event are suppressed first; • Messages to indicate to the operations team that suppression is active, or means for operators to see which alarms have been suppressed; and• Necessary conditions for the release of suppression as the plant returns to normal or enters a different operating mode. Logic used to detect the need for suppression and for release from suppression must be robust and transparent to the operations team. Because many plants currently don’t utilize any alarm suppression functionality, it promises considerable improvements — particularly when robust tools for suppression are available from DCS vendors. Make More Progress
The need for improved alarm management gradually became more apparent after the Milford Haven accident and has received much attention and investment in recent years. Considerable progress has been made — much of it due to wide application of the EEMUA guidance to rationalize the alarm system. A significant number of plants now average rates of fewer than 10 alarms per hour in routine operation, enabling plant operators in such plants to be much more effective and proactive. However, significant problems remain. In particular, many plants experience alarm floods far too often. An ASM Consortium study found that not a single one of 37 consoles studied achieved the EEMUA recommendation of fewer than 10 alarms during the first 10 minutes of an upset. During floods, many alarm systems are of little value to operators and are effectively unusable. Clearly, more-effective alarm systems may have avoided some serious accidents or at least reduced their consequences. Unfortunately, alarm rationalization efforts alone don’t provide the performance improvement needed during upsets. Bodies such as the U.K.’s Health and Safety Executive recognize the problems that can occur when alarm management is poor, and are providing regulatory drivers for plants to improve performance. More extensive use of the six techniques described here can play a significant part in enhancing alarm management. Take Advantage of Shutdowns
Alarms failing to activate during accidents can lead to dire consequences. For example, some significant liquid-level alarms didn’t go off during the 2005 accident at BP’s Texas City, Texas, refinery . Such alarms may have been out of service for a long period without staff realizing this. If fully operational alarms had been activated during the startup, operators conceivably might have had sufficient time to avoid or reduce the consequences of the accident. Don’t wait until a startup to find out if important alarms for the startup are working. Schedule testing during the shutdown period immediately prior to startup — particularly if it’s known that these alarms haven’t been activated for a long period. This testing should be highly selective, focusing on the small number of higher priority alarms that truly are significant during startup operations. This strategy is much more cost-effective than simply relying on routine testing of all configured alarms.
Peter Andow is a principal consultant, advanced solutions, for Honeywell Process Solutions, Bracknell, U. K. E-mail him at [email protected].
1. “The Explosion and Fires at the Texaco Refinery, Milford Haven, 24 July 1994,” HSE Books, Sudbury, U. K. (1995).
2. Bransby, M. L. and J. Jenkinson, “The Management of Alarm Systems,” HSE Books, Sudbury, U. K. (1998).
3. “Alarm Systems — A Guide to Design, Management and Procurement,” Publ. No. 191, 2nd ed., EEMUA, London, U. K. (2007).
4. Andow, P., “Abnormal Situation Management: A Major U. S. Programme to Improve Management of Abnormal Situations,” IEE Colloquium on “Stemming the Alarm Flood,” London, U. K. (1997).
5. Campbell Brown, D., “Practical Steps Toward Better Management of Alarms,” Proceedings, “Alarm Systems,” IBC, London, U. K. (2000).
6. Nimmo, I., “Rescue Your Plant from Alarm Overload,” Chemical Processing, Jan. 2005, p. 28, http://www.ChemicalProcessing.com/articles/2005/209.html (2005).
7. Reising, D. V. and T. Montgomery, “Achieving Effective Alarm System Performance: Results of ASM Consortium Benchmarking against the EEMUA Guide for Alarm Systems,” Proceedings, 20th Annual CCPS Intl. Conf., Atlanta, Ga., AIChE, New York City (2005).
8. Zapata, R. and P. Andow, “Reducing the Severity of Alarm Floods,” Proceedings, Honeywell Users Group Americas Symposium 2008, Honeywell, Phoenix, Ariz. (2008).
9. Errington, J., Reising, D. V. and K. Harris, “ASM Outperforms Traditional Interface,” Chemical Processing, March 2006, p. 55, http://www.ChemicalProcessing.com/articles/2006/041.html (2006).
10. “Refinery Explosion and Fire, BP Texas City, March 23, 2005,” Report No. 2005-04-I-TX, U. S. Chemical Safety and Hazard Investigation Board, Washington, D. C. (2007).