Control Systems

Optimize Alarm Management

An effective rationalization strategy provides important benefits

By Hiroaki Tanaka, Yokogawa Electric

Better control strategies and procedures directly improve plant production and operations. However, poor alarm management, which is all too common, can undermine both productivity and safety.

The importance of effective alarm management has increased as the number of alarms has risen. Today, because of innovative hardware, software and instrumentation, plants can install alarms on virtually all plant equipment to improve productivity and safety. In addition, the growing sophistication of smart field devices that can warn about condition and performance issues has unleashed another wave of alarms (many of which should go to maintenance rather than operations). This often leads to over-alarming and alarm floods from the distributed control system (DCS) afflicting operators when an upset occurs.

Most chemical plants don’t need more alarms, instead they need better alarm management to identify and implement essential alarms and to eliminate unnecessary ones. Working with alarm rationalization consultants in combination with software to guide and track the progress, chemical companies can rethink their plant protection schemes and develop an alarm hierarchy to rank alarms and events according to impact on the process or critical equipment. Such programs can reduce the number of alarms to only the most crucial ones, thus improving operator response and increasing plant uptime.

So, here, we’ll explore opportunities to enhance operational performance through rationalized alarm databases and systems as well as using software to clarify corrective operator actions.

The Protection Hierarchy

Plants rely on a number of independent protection layers (IPLs) to handle incidents (Figure 1). If an event moves up the list of layers, each escalation results in a more drastic action and has the potential to cause a more catastrophic impact.

Effective process alarms are part of IPL 2 and 3; they should spur actions to address an incident within those layers because that will avoid or at least minimize production disruptions. Moving to IPL 4 leads to a more drastic action related to the safety instrumented system (SIS) or an emergency shutdown (ESD) function, which almost invariably results in more severe consequences capable of adversely affecting production levels or quality.

Even so, the situations causing alarm trips at IPL 2 can undermine, even if only slightly, operational efficiency and product quality. Some plants tend to concentrate on protecting equipment and, thus, give the highest priority to those alarms. This often is wasteful because those equipment protection functions are better left to IPL 4. Giving those alarms the highest priority makes the entire alarm system more difficult to handle.

At the same time, moving to IPL 4 often brings a new wave of alarms that aren’t related to process conditions but to the performance of the SIS itself. For example, if the valve at the inlet of a reactor doesn’t close as it should when directed by the SIS during an incident, an alarm should trip to warn that this emergency step hasn’t happened and the valve may require manual closing. These types of alarms should be in their own category because the consequences of such situations are well beyond those of alarms tied to IPL 2 and 3. Some plants don’t set these alarms apart; they become lost among more trivial lower-level ones.

Therefore, it’s critical during the alarm rationalization process to determine the consequences of a situation generating an alarm and the corresponding remedial action (Figure 2). As the diagram illustrates, as a situation escalates, the consequences and needed actions become more drastic and have a more serious impact on production.

For existing processing facilities, it’s especially important to develop a plant-wide protection hierarchy and priority matrix to identify and create better alarm strategies. Alarm management and SIS evaluation both begin with inherently safer unit design. A unit that is well regulated will have fewer upsets, alarms and incidents. When alarms primarily serve as a mechanism to compensate for poor control strategy, the plant effectively is running in manual. Any evaluation of alarm strategy must begin by asking how well the unit runs under normal conditions and then correct any glaring flaws.

Alarm Rationalization Activities

The team doing the review should consist of a cross-functional group of engineers, supervisors and experienced staff from process, operations, maintenance, automation and project engineering departments. An alarm rationalization project typically begins with an initial objective of reducing nuisance alarms. This is a broad category; it can be defined as alarms that don’t match the characteristics of good alarms recommended by EEMUA guidelines [1]. Typical nuisance alarm characteristics include:

• calling attention to a situation that the DCS or SIS can correct automatically with no production disruption or other ongoing consequences, and so not requiring operator action;
• duplicating another alarm or multiple alarms responding to the same root cause;
• giving a false alarm caused by an arbitrary trip, often due to an incorrect setpoint;
• indicating a situation, such as one created by planned maintenance, not needing an operator response;
• requiring an operator response — but the operators don’t know what it should be (which is actually a training issue);
• demanding an operator response that is too complex or involved to be carried out in the available amount of time;
• necessitating an operator response that the DCS could — and should — do (which really indicates the function should be automated more effectively in the DCS); and
• resulting in no consequences if the operators cancel or ignore the alarm.

Evaluating nuisance alarms should begin by identifying the top ten bad actors, i.e., those that are activated the most. The team must determine the root cause of each and why the alarm was set up in the first place.

The more complex side of the analysis uses a statistical technique to examine alarm activity. Event balance trend (EBT) graphing looks at a specific period of time and compares the number of alarm notifications received against the operational record of responses. The areas where activity peaks typically receive the most extensive study.

Figure 3 depicts an EBT graph for a polymer plant that includes a grade-change period. The upper graph traces activity before reduction of nuisance alarms while the lower one shows the beneficial effects of using the top ten approach to decrease the bad actors. Note that the vertical scales on the two graphs differ, so the change is more significant than may be apparent at first glance. Subsequent rounds of top ten reductions become more complex and difficult because fewer nuisance alarms exist. Here, the EBT analysis identified certain time periods corresponding to specific grade production (red circle on left) and timing of grade change (red circle in center). Therefore, the next step of nuisance alarm reduction focused on the troublesome grade and grade-change periods instead of repeating the top ten approach.

The EBT analysis involves viewing process conditions and event data before and after the operators’ actions, and asking key questions such as:

• What process conditions were changing prior to the alarm?
• Did an outright equipment failure occur?
• Why didn’t the DCS correct the situation?
• Did the operator respond correctly and, if so, did this lead to the desired and expected result?

Combining process and event data can identify root causes and the effects resulting from countermeasures. Evaluating the alarm system with EBT can effectively provide visualization of alarm notification and operator actions.

Rationalizing The Alarm Database

The rationalization process, when boiled down to the most essential questions, looks at two main issues: what situations and incidents in this plant or unit must have alarms, and what should those alarms look like? When starting with an existing alarm database, the team must identify the good alarms that protect and optimize process/plant operations, not just the bad actors. The EEMUA guidelines [1] detail the characteristics of a good alarm:

• relevant — providing a non-spurious high-operational-value warning;
• unique — not duplicating another alarm;
• timely — occurring soon enough for an operator to respond effectively;
• prioritized — indicating the priority an operator should give the alarm;
• understandable — having a clear and easy-to-grasp message;
• diagnostic — identifying the problem that has occurred;
• advisory — giving an indication of the action required; and
• focused — drawing attention to the most important issues.

These guidelines provide the basis for conducting the alarm rationalization.

The rationalization process determines alarm attributes such as priority, setpoints and operator action so every alarm meets good alarm characteristics recommended by the EEMUA guidelines. However, plants often have a massive number of alarms in their process automation systems (DCS, SIS, etc.) and subsystems (distributed programmable logic controllers, etc.). Consider one actual example: a facility had ≈1,500 process measurement points and another ≈2,900 in subsystems such as dedicated packages, creating ≈10,500 function blocks for single points; this, in turn, led to nearly 64,000 alarms! So, if the site reviewed 50 alarms/d, the project would take more than three years. Clearly, such a rationalization effort can require enormous personnel and time resources. Therefore, setting up a strategy to move through the database systematically to avoid getting bogged down is crucial.

The analysis of a process unit or whole plant should follow a series of steps:

1. Survey the existing alarm configuration. Defining and understanding present process conditions and the alarm system are critical. That can involve extensive plant and process documentation. At this point, the team calculates key performance indicators (KPIs) for all alarms, reviews the alarm configurations, totals alarm counts and collects operator feedback from past alarms and events (A&E).

2. Develop a philosophy for alarm design. The results from the alarm analysis are presented in a priority matrix that evaluates the consequences of failing to effectively respond to an alarm. Typically, minor events have no measurable financial repercussions. However, as the severity of the event increases, consequences rise substantially.

3. Categorize and prioritize the alarms. This step (Figure 4) helps lay out alarm attributes by category and priority as groups before discussing individual alarms. The breakdown categories include: fire and gas safety (FGS) systems; process; diagnostic; and DCS internal alarms. Each alarm should get a priority level, ranging from low to high.

4. Identify individual alarm attributes. Such determinations require current process and instrumentation diagrams, control narratives, hazard and operability reports, DCS alarm database, alarm list, cause/effect diagrams and more. Using these documents, the team can develop a philosophy for each alarm. This step creates direct and indirect detection of alarm events, and identifies other sensors and alarms triggered before hazard detection.

5. Create an updated alarm database. With the priority matrix and completed alarm determination documents, the team can update the alarm database to meet the site’s alarm philosophy and reduce the number of nuisance alarms. Alarm rationalization achieves and provides good alarms that optimize operator countermeasures to quickly return the process to normal operations.

Following these steps eases successfully completing one cycle of alarm rationalization. However, it’s important to realize that sustaining the improvements requires ongoing support [2].

In addition, it’s necessary to find a suitable interface to present the information and provide appropriate functions to the operators. Ideally, all functions, such as alarm filtering and shelving, follow EEMUA guidelines.

Two Case Histories

A petrochemical facility was investigating opportunities to increase processing uptime and optimize unit production. As part of the investigation, the company focused on operator workload and the effectiveness of the existing alarm system. Early indicators pointed to issues with the plant’s alarm strategy and processing performance, so the firm assembled a cross-functional team of engineers and operators to work with an alarm-strategy consultant to study the nuisance alarm situation. The aim was to provide recommendations to lower the KPI of the system to two alarms every ten minutes from six.

The team reviewed the process A&E logs from a one-month period to identify possible root causes for the nuisance alarms, evaluate proper countermeasures, and reduce or eliminate alarms. In addition, it checked alarm-off events, along with the philosophy and strategy of the present alarm system.

Several areas stood out as opportunities for improvement. For instance, addressing field-related issues, typically instrument failure or improper ranging of scale, could reduce nuisance alarms by approximately 50%. The appropriate countermeasures then were determined in a follow-up workshop.

The team identified and designated a list of important alarms as the core of the alarm database and system. The updated strategy and database reduced the alarm KPI to one alarm every ten minutes, an 83% reduction from the previous level of six. In addition, the team created management-of-change policies to ensure optimized performance of the alarm system/strategy and processing operations over the long term.

At another plant, sophisticated software customized the human/machine interfaces connected to the process unit so people in different functional areas only saw the alarms relevant to their particular area. All alarms remain in the system but control and process engineers, operators and management see different alarms.

This approach allows the various functional-area teams to group, shelve and filter alarms. Using attribute and rule functions, these specialists can easily view and set priorities, purpose, time-to-run and consequences for just the alarms needed by their specific job function. (Reference 3 provides further discussion about shelving functions.)

Achieve Measurable Benefits

Improved alarm visualization and rationalization can play an important role in enhancing process operations. Alarm rationalization projects ensure proper operator response to process conditions that exceed setpoints and merit action. Developing alarm strategies and philosophies to enhance operator response and optimize countermeasures yields quantifiable results.


 

HIROAKI TANAKA is a consultant specializing in alarm management and data analysis for Yokogawa Electric Corp., Tokyo. Email him at Hiroaki.T@jp.yokogawa.com.

REFERENCES
1. “Alarm Systems — A Guide to Design, Management and Procurement,” 3rd ed., Engineering Equipment & Materials Users’ Assn., London, U.K. (2013).
2. “Operations Management.
3. “The Benefits of Alarm Shelving.”