Avoid the Domino Effect

Most process plants constantly strive to improve their operations. The performance of operators and the alarm system can markedly impact the quest to bolster safety, minimize unplanned downtime, increase productivity and achieve other gains. Help is on the way from a new International Society of Automation (ISA) standard. "Management of Alarm Systems for the Process Industries," ISA-18.2, provides a framework for the successful design, implementation, operation and management of alarm systems [1]. (The about-80-page document was approved on June 23, 2009, and is available at isa.org.) This article offers an overview of the standard with a focus on how it will impact plants.

The Importance of Alarm Management
With production running ever closer to equipment and facility operating limits, proper alarm management has never been more crucial. Poor alarm management is a main cause of unplanned downtime, which costs plants more than $20 billion in lost production every year [2]. It also has significantly contributed to some of the worst industrial accidents on record (including Three Mile Island, Bhopal, Milford Haven and Texas City), which led to injury, loss of life, equipment and property damage, fines and harm to company reputation.

Alarm Design Activities

Table 1. Detailed design of an alarm system includes three main steps.Today operators receive a large amount of data from their console displays about process performance. They must be able to make quick decisions to keep operation in its normal or target range. The alarm system should notify operators to take action when the process risks crossing a performance boundary (Figure 1). Failure to promptly and effectively respond can lead to off-specification product, a process upset, an unplanned shutdown or an accident. The connection between problems in alarm management and process safety accidents was a motivator for developing ISA-18.2. Both the U.S. Occupational Safety and Health Administration and the U.K. Health and Safety Executive have identified the need for improved industry practices to prevent incidents. It's widely anticipated that ISA-18.2 will achieve the status of recognized and generally accepted good engineering practice (RAGAGEP) by both insurance companies and regulatory agencies. As such, it becomes the expected minimum practice at sites, especially if an incident does occur. The standard contains a clause that allows for past practice but this doesn't exempt companies from monitoring their alarm systems to demonstrate acceptable performance. A Never-Ending JourneyGood alarm management isn't a one-time activity but a process that requires continuous vigilance from operations, engineering and maintenance teams. Consequently the standard has been structured according to the alarm management lifecycle (Figure 2), which is similar in many respects to the standard governing process safety, ANSI/ISA-84.00.01-2004 Part 1 (IEC 61511-1 Mod). ISA- 18.2 covers the various stages in the lifecycle (see Table 3):

Alarm Performance Metrics

Table 2: Develop suitable metrics from an adequate pool of data, at least 30 days' worth.
Source: Ref. 1.

Philosophy. An important and often first step is creating an alarm philosophy document. It will guide all alarm management activities at a site and is critical for helping plant staff maintain the alarm system over time. It's the alarm management "bible," outlining practices and procedures for how to classify and prioritize alarms, what colors to use to indicate an alarm in the human machine interface (HMI), and how to manage changes to configuration. The document also should establish key performance benchmarks like acceptable alarm load for operators. What it means. An alarm philosophy document is the cornerstone of an effective alarm management program. It's also important for demonstrating compliance to the standard and for facilitating internal discussions with major stakeholders. For new plants, alarm philosophy should be fully defined and approved before commissioning.

Process Condition Model
Figure 1. The particular status of operations should
trigger specific control system actions. Source: Ref. 1.

Identification. What is and isn't an alarm? How do you know whether you should alarm an input from the field? ISA-18.2 provides clear guidance. It defines an alarm as "an audible and/or visible means of indicating to the operator an equipment malfunction, process deviation or abnormal condition requiring a response. The italics underscore an important alarm- management principle: if an operator doesn't need to respond, then don't provide an alarm! Pretty simple… Following this cardinal rule will eliminate a large portion of potential alarm-management problems.

Many sources -- e.g., process and instrumentation drawings (P&ID), operating procedure reviews, process hazards analysis (PHA), safety requirement specifications (SRS), hazard and operability studies (Hazops), incident investigations and quality reviews -- can help identify candidate alarms. You also can use alarms to indicate process performance boundaries such as off-target or pre-upset (Figure 1).

What it means. Identification involves determining what might merit an alarm and what should trigger it. Many process control systems allow configuring five or more alarm conditions (high-high, high, low-low, low, rate-of-change…) per input/output (I/O) point; this contributes significantly to alarm overload. Analysis may determine that only one alarm condition (such as "high") is necessary for a temperature input to keep a process safe and under control. Exercise engineering judgment to identify exactly what conditions require alarms and why rather than enabling every alarm condition available in the system.

Rationalization. Here, a cross-functional team from operations, process control, maintenance, safety, etc. analyzes each potential or existing alarm to make sure it meets the definition of an alarm. Does it indicate an abnormal condition? Does it require an operator action? Is it unique or do other alarms indicate the same condition?

Alarms that pass this screening are analyzed further to define their attributes such as alarm priority and alarm limit. Results are documented in a master alarm database that contains information such as:

• basis for the alarm;
• consequence of a deviation;
• expected operator action;
• time for the operator to respond;
• alarm class and priority; and
• alarm type and setpoint (limit).

What it means. The information documented in the master alarm database has value throughout the lifecycle. For example, many plant operations/engineering teams are afraid to eliminate an existing alarm because "it was obviously put there for a reason." With the master alarm database, you can look back years afterward to see why a specific alarm was set up and evaluate whether it should remain. It's also a good practice to make this valuable information accessible to operators — particularly the consequence if they don't correct the problem and how they should respond.

Detailed Design. This process consists of three main activities: basic alarm design, HMI design and advanced alarming (see Table 1).

Basic alarm design involves using information contained in the master alarm database to plan and configure the system. Poor configuration practices are a leading cause of alarming issues — following ISA-18.2 recommendations will help prevent them. For example, proper use of dead bands and off-delays can go a long way to eliminate "chattering" alarms, i.e., ones caused by points that repeatedly transition between the alarm state and the normal state in a short time, that operators then ignore.

HMI design is all about presenting alarms in a way that enables operators to quickly detect a deviation, diagnose the problem, determine corrective action and then respond appropriately. Effective operator performance depends on proper use of color, text and patterns within the HMI. The goal is to clearly and uniquely indicate the state of the alarm (normal, unacknowledged, acknowledged, suppressed) while also providing functionality such as filtering and navigation links within alarm displays.

Advanced alarming addresses how to build in "smarts" to support the operator. To optimize operator performance, only present alarms when they are meaningful. Additional layers of logic, programming or modeling are added to the system to modify alarm attributes or suppression status during operation. This ensures alarms that are insignificant because of the state of equipment (e.g., redundant pump running) or plant (e.g., area shutdown for maintenance) aren't presented to the operator. One common example is suppressing a low flow alarm when it's triggered as a result of a pump trip. The operator must focus on the underlying cause, the trip, and not low flow. Another example is modifying alarm setpoints and priorities for different batch recipes. It's also possible to make relevant information like a standard operating procedure available to the operator in context via information linking.

ISA-18 Lifecycle Model

Table 3. Developing, operating and maintaining an alarm system involves ten distinct stages.What it means. Following the standard's design recommendations and requirements is key to creating an effective alarm system. This can preempt many potential issues that normally would surface during system operation. A good design prevents many nuisance alarms and ensures that needed alarms are clearly presented to the operator. Success depends on developing effective design procedures and documents — and having the discipline to adhere to them. Implementation.When putting the alarm system or an individual alarm into operation, testing and training are key activities. Alarms classified as "highly managed alarms" (such as safety alarms) require extra attention, including creating a documented alarm response procedure. You must revisit this phase as new instrumentation and alarms are added or process changes are implemented. What it means. Alarm systems shouldn't be in put into service without proper training for operators and maintenance personnel. Operators must be comfortable with the system and learn to rely on it. Operation. This section of the standard describes the appropriate use of alarm-handling tools like shelving by operators and necessary documentation to help them do their job. Alarm response procedures should include information such as setpoint (limit), potential causes and consequences of an alarm, recommended corrective action and allowable response time — which is information fleshed out during rationalization and documented in the master alarm database. Alarm shelving allows an operator to temporarily suppress an alarm to prevent being distracted from more important alarms [3]. Shelved alarms will reappear after a fixed period of time, ensuring they aren't forgotten and can be addressed after more critical alarms have been handled. What it means. Alarm shelving should be used with care. Develop procedures to define who has authorization to shelve an alarm, for reviewing the list of shelved alarms and for determining if interim alarms are necessary. Maintenance. You may have to take an alarm out of service for repair, replacement or testing. This section of the standard provides recommendations for periodic testing and for handling alarms that will be non-operational for an extended duration. For out-of-service alarms it's important to document the approver, the reason the alarm was removed from service and any details concerning interim alarms or special handling procedures needed. The system should provide a list of out-of-service alarms that can be viewed on demand, such as before starting up a piece of equipment. What it means. Use only approved methods when taking an alarm out of service and provide documentation to remind people to return the alarm to service at a later time. Too many out-of- service alarms can cause the same incidents as too many alarms — because either way the operator fails to take action when needed. Monitoring & Assessment. You must verify alarm system performance against objectives defined in the alarm philosophy (developed based on ISA-18.2). The number of alarms presented to a single operator is a key performance metric. Studies have shown that an operator receiving on the order of 150 alarms per day (i.e., approximately one every 10 minutes) should be able to respond effectively. In practice, many control rooms run at 10 times this level, forcing operators to deal with one alarm every minute — which makes it difficult for them to respond correctly. (Investigation of several industrial incidents has found operators were confronted during an upset condition with too many alarms to respond effectively.) Other key metrics are the number of stale alarms (those that linger on the console even when nothing is wrong) and how many alarms an operator faces during an alarm flood (Table 2) [1]. What it means. An unmonitored alarm system is almost always broken. Measuring the performance of your alarm system against established key performance metrics can serve as the first step in improving an existing system and is important in establishing the system is adequately performing. Monitoring and assessment must occur periodically. Management of Change. Modifications to the alarm system must be reviewed and approved prior to implementation. In this case a "change" could refer to altering an alarm limit, adjusting its priority, adding a new alarm point or implementing an advanced alarming technique. Modifications shouldn't be made without proper analysis and justification. Once the change is approved, update the master alarm database to keep it current.

Alarm Management Lifecycle

Figure 2. Proper alarm management requires adhering to a ten-stage process. Source: Ref. 1.What it means. Even the most-well-designed alarm system can run into problems if there's no control over who can change it. Put policies and procedures in place to ensure who can change what is well understood and enforced. A good practice is to periodically review the actual running alarm system configuration versus the master alarm database to check that no unauthorized configuration changes have been made. Audit. To maintain the integrity of the alarm system periodically conduct an audit focused on plant alarm-management processes. A system audit compares operation and performance against the principles and benchmarks documented in the alarm philosophy. The goal is to identify improvements to the system and workflow process, as well as potential modifications to the alarm philosophy document. What it means. Audits are important for both new and existing systems. For an existing system an audit or benchmark against a set of documented practices (such as the ISA-18.2 standard) may be the logical first step in an improvement project. Table 3 summarizes the stages and their activities.

A Vital New ResourceThe ISA standard provides definitions, practices, requirements and recommendations that will bolster your plant's quest for operational excellence. Like other standards, ISA-18.2 tells you "what" needs to be done but doesn't dictate "how" to do it. Key takeaways from the standard are: • Realize that alarm management is an ongoing cyclical process that's never complete. The standard follows a lifecycle approach.• Develop an alarm philosophy document that tells how your plant will address all lifecycle phases. It should contain everything from the criteria for setting alarm priority, to the colors in HMI displays, to who can make changes to configuration. • Rationalize alarms to ensure that every alarm has an essential purpose and requires an operator response. • Create and maintain a master alarm database to document the what, why and how of each alarm. Update this database when changes occur and make this valuable information available to operators.• Analyze and benchmark system performance. Tools can help you analyze alarm history and can automatically generate a report showing how you compare to recommended key performance indicators.ANSI/ISA-18.02-2009 requires these activities; they are expected to become standard practice in the process industries.

Nicholas P. Sands is a process control engineer for Dupont in Wilmington, Del., and serves as co-chair of the ISA Alarm Management Standard Committee. E-mail him at [email protected]. Todd Stauffer is business development manager at exida, Sellersville, Pa., and also is a member of that ISA committee. E-mail him at [email protected].

References:

1. "Management of Alarm Systems for the Process Industries," ANSI/ISA ISA18.00.02-2009, ISA, Research Triangle Park, N.C. (2009). 2. O'Brien, L., "Alarm Management Strategies," ARC, Dedham, Mass. (Nov. 2004). 3. "Alarm Systems —A Guide to Design, Management and Procurement," 2nd ed., Engineering Equipment & Materials Users' Assn., London, U.K. (2007).