Executing Alarm Management

Executing an alarm management strategy is no small task. One of the hardest parts is sustaining the work effort. As plants change and expand, personnel turnover, and operations continue, nuisance alarms seem to creep back. For you to succeed in advancing your strategy, while keeping the peace, you must meet many challenges motivating personnel and juggling the integration of changes.

Long, long ago alarm tales

You knew you had an alarm management problem for quite some time: all of the signs were present. Operations managers recommended using both sides of line printer paper as a cost-cutting measure because of the high number of alarms and events. You noticed a rise in the failure rates of operator keyboards and trackballs. During plant startups and shutdowns, operators repeatedly hit the acknowledge key until the annoying alarm tone ceased; the alarm list was ignored. While this isnt true in all plants, its probably true in most. Your to do list includes looking into nuisance alarms but there was always some other project, more visible, with a potential for more financially tangible results.

Finally, you have the go a head to do something about this problem. Because of years of increased focus on plant safety, alarms are the best indicator of plant safety at your facility. Alarm management has visibility. Regulatory agencies are watching. This, like many other trends, varies by region in degree to which regulatory agencies enforce such programs. For example, in the U.K., the Health and Safety Executive (HSE) has threatened to shut down facilities without an alarm management program. In the U.S., bodies such as the Occupational Safety & Health Administration (OSHA) arent yet as aggressive but are catching up fast. Another reason for their focus is: more than 42% of reported incidents involve human error (Figure 1). Incidents included poor communication of alarm conditions, poor operator training (to deal with abnormal situations), and bypassed safety measures. Regulatory interest recognizes a simple truth: your operators are over-burdened with alarms.

Compare analysis results

To execute your project, you probably read various articles in trade magazines and other sources, which gave you advice such as measure, analyze, and improve or benchmark, plan, and implement. While this made perfect sense, the details arent clear. So, you started by trying to assess how bad the problem actually was to establish the baseline.

You tried to do some reporting through the control system; it was time-intensive and easy to miss something. So, you purchased a third-party alarm analysis software tool that collected the historical alarm and event data from the control system and the plant. Though there was some tuning of how the analysis software treated the alarms, this was definitely less painful than manually moving, sorting and writing queries against the data. You compared your analysis results with the Engineering Equipment and Materials Users Association (EEMUA) 191 guidelines, and your hand automatically slapped your forehead and slowly dropped down to your chin (Figure 2). This is a normal reaction when you find out the numbers are outrageously higher than the guidelines state. The data collected clearly showed, at least in some plant states, the operators at your facility were operating in a reactive mode instead of predictive (Figure 3). Reactive is where operators have to react to certain conditions due to being flooded with alarms instead of using them to avert potential problems before they happen, which is predictive.

After analysis of the data, you developed a multi-faceted approach that converted your overloaded or reactive alarm-annunciation status to more predictive. You followed one simple rule: if an operator isnt supposed to do something immediately, in response to an alarm, if that alarm isnt directly related to a root cause, its a nuisance alarm. From this you developed your goals:

Low-hanging fruit: Reduce the number of nuisance alarms using analysis reports. The analysis tools pointed out some easy changes to consider that have little impact on operations.
Alarm rationalization and documentation: The size of this effort depends on the scope and the size of the facility. The outcome of this goal is an alarm philosophy document that should act as the bible on the future handling of alarms at your facility. How this bible is used is important avoid imposing bureaucracy and overhead. Experience shows the best team to deal with these alarms is the team already in place.
Dynamic alarm handling: Spikes of alarms occur during certain phases of production such as startups and shutdowns. Some of these alarms are routine and not relevant to safe operation. Reaching the desired EEMUA alarm statistics may require allowing operators to mask some alarms during busy times.

Hurrah! The job is over

The quick hits to get rid of those nuisance alarm pests happened rapidly and the operators seemed to appreciate the results. The alarm rationalization project was more involved. Thanks to help from a vendor your bible is progressing.

This took longer than expected, but is well worth it, as now that you feel youre in control of your alarm configuration. Also, the task you thought would be more difficult, dynamic alarm handling, was actually easier than expected due to new features in your latest control system that allow alarms to be dynamically hidden to the operator based on process conditions.

Although your results arent to the EEMUA recommendations, you can boast a 50% reduction during normal operations and close to 80% reduction in nuisance alarms during plant state transitions.

Months later, you decide to rerun some of the reports that you first ran for your benchmarking exercise, and its lucky you did. Your once impressive numbers no longer exist. How could this happen? ABB Engineering Services metrics indicate 97% of all new nuisance alarms come from one of three sources:

Fault usually its the instrumentation;
Process change; or
Minor project.

After pondering this subject in your not-so-spare time, you develop a sound hypothesis for the return of the nuisance alarms plant changes.

In the time it took to complete some of the alarm reduction projects for your facility, other changes were taking place. There were projects for replacing older plant equipment as well as an expansion or two.

Of course, most of the changes that are giving you nuisance alarms arent projects at all; theyre a result of the inevitable drive towards efficiency and profitability from existing plant equipment. Let's push that setpoint just a little higher moves that noisy reading closer to the alarm threshold, or perhaps changing that controller tuning destabilized something else, causing an alarm downstream.

The problem was the people executing the DCS (distributed control system) configuration work or moving those process variables were unaware of the ongoing alarm management efforts. While you are looking at the modifications, you notice all the new changes have been implemented with the highest priority of alarm.

When you call, the design engineer is emphatic that the project is of the highest importance and deserves the immediate attention of the operator. The only solution to this challenge is plant-wide adoption of an alarm philosophy. This will ensure new projects and modifications follow this guideline and the work done to date will continue long after your promotion to another job.

Remember all those instrument faults you fixed when you did the original alarm reduction project? Guess what theyre back. Moreover, some of them are new. You remembered to empower maintenance to diagnose and fix faults that cause nuisance alarms, as well as create a procedure for shelving the alarms while theyre undergoing repair, didn't you?

Then, you begin to wonder if change management procedures are a contributing factor. In spot-checking some of the problem alarms you thought you had corrected earlier, you find some of the alarm settings changed compared to the alarm rationalization documentation. You try to find a reason why, but it most likely happened soon after you changed them the first time. Putting these back in place is likely to take time for approvals. Perhaps excluding those pesky process engineers while you trained operations was not such a good idea.

Best way to eat an elephant

The only thing you seemed to have learned from this exercise is that youre the only one who cares about alarm rationalization.

Operations and maintenance personnel already have enough to do. The operations supervisors main goal is making production numbers, not reducing nuisance alarms, and no one wants to be the scapegoat for not making quota.

Everyone sees the problem with alarms but nobody knows how to approach it. The reality is everyone is doing some alarm-management work today, but theyre not doing it as efficiently as possible. Shift supervisors review inhibited alarms at the beginning of each shift. Operators scream about nuisance alarms but dont have the tools, or the time, to identify which one is causing the most grief. DCS technicians respond in a patchwork manner to the various requests to make the necessary changes. And, maintenance desperately tries to squeeze in the requested instrument adjustments or equipment maintenance to address some of the issues. The problem is that for the most part, this is all gut feel.

Whats necessary is a program that effectively deals with nuisance alarms with a minimum of bureaucracy.

The trick is as simple as ingraining very basic alarm-review procedures into the existing work culture. Soon, operations and maintenance, and even process engineering will realize how much easier life can be with an alarm management strategy. But first, you must put the facts in front of them (Figure 4): poor alarm management is everybodys problem. By identifying the one or two alarm problems every week, and by simplifying the internal Management of Change (MOC) process to more seamlessly enable regularly required changes, your alarm system will be humming for years to come. The best way to eat an elephant is one bite at a time, right? Especially, if you invite some friends over for dinner.

With things running smoothly, your operations team should be recognizing problems early. Because they have been empowered, alarms will be selectively refined to identify the root cause of an abnormal condition. The maintenance team will already be dealing with those problems. The engineers will be designing with alarm management in mind, and when they dont, will get a prompt and focused push from the operations team. The operations supervisor will be periodically reviewing the status of alarms and the performance of the management system and making sure its working smoothly. Maybe that leaves you enough time to look at the next level of operability; perhaps the EEMUA targets dont look that far away after all. Of course, the team will need new tools to make it to the next level.

Software platform standards

The solution is to get the entire facility involved with alarm management while not impeding production and minimizing the effort required. One way to do this is to streamline work processes through seamless integration of the various systems required to continue your alarm-management strategy. In the past, integrating a DCS with third party software was either a) impossible or b) expensive and hard to maintain. Similar to the realization that your car is a lemon because you know your auto mechanic on a first name basis, you know the solution wasnt a true integration when you totaled how much money went to maintenance at Joe's Software House who implemented the solution. This situation wasnt always the integrators fault, as the project followed an end user specification, faced limited infrastructure, and involved systems that didnt originate for integrating into your system. In any event, there were dependencies on the integrators for modifications, routine maintenance, and upgrades.

Some of the latest automation systems use software platforms based on standards truly built for interconnectivity of applications and which take the management and communication of alarms fully into account. When these other applications are built to similar standards, such as ActiveX, OPC, HTML, XML, COM, Web interfaces, and others, the result is a better solution maintainable without being dependent on specific resources.

With this type of platform and use of standards, what needs seamless integration in order to streamline operations?

Benchmarking/analysis tools: The third-party software that was to collect and analyze alarm and event data is already collecting data automatically, but the pre-canned analysis reports were only available to the engineer on a separate computer. These reports should be readily available to the operations supervisor, maintenance technicians, and operators. If available with a mouse click, the right user can get access to the right report. Todays alarm analysis tools typically have Web-based access for standard reports such as frequent, duplicate, standing, and chattering alarms as well as more general, performance-statistic-based reports to give the overall picture. The result is where an operations supervisor has access to alarm performance statistics or an operator can verify that a certain tag has nuisance alarms without disrupting normal work routine. Barriers such as special procedures, dedicated PC access, training, different passwords, forms, are non-existent.

Asset optimization: Traditional asset management packages usually dealt with smart assets. Smart assets, such as HART, FF, or Profibus transmitters, have status information that aid more efficient preventative and predictive maintenance strategies. Today, asset optimization is reaching beyond traditional asset management tools by monitoring devices that are simple network management protocol (SNMP) capable for computer and networking equipment status. Other assets are vibration detection sensors, analyzers, electrical devices such as drives and motor controllers, plant equipment such as heat exchangers, plant entities such as reactors, and even user defined data such as key performance indicators (KPIs).

By monitoring KPI information, the right people can be aware of status information that means more to the business objectives and notifies people in various ways that action is necessary. The use of alarm-management performance statistics that yield information such as standing alarms, alarms per time period, and benchmarked information (such as whether areas of the plant are being operated in a reactive, stable or predictive mode) can preemptively warn supervisors of possible problems. If caught early, you can avoid these usually hidden problems and improve performance, reduce equipment wear and tear, and even prevent accidents.

Notification tools: While asset optimization tools are great at monitoring and reporting issues, if no one ever looks at them, is there a problem that needs fixing? You must have a notification methodology to alert the right people of issues that may arise from alarm management performance indicators and other monitored assets. One way is to have alerts come up to an operator screen, but this requires careful handling, otherwise they, too, become nuisance alarms. In some cases, notification can take place in a non-intrusive way that doesnt divert the operators attention away from their responsible process area(s). Other means of notification are via e-mail, text messaging and paging. Short message service (SMS) capabilities are automatically available in automation systems so when the number of standing alarms crosses a pre-defined limit or benchmark, the operations supervisor receives a page and concurrently the area engineer and plant manager get e-mails. While current supervisors, engineers, and managers are cringing at the idea of receiving yet another e-mail, it is likely their peers now facing criminal charges of negligence for not being aware of plant conditions during life taking/environment damaging incidents would welcome such an inconvenience.

Computerized maintenance management systems: When integrating systems that result in the notification of potential problems, it makes sense to streamline the back end. In some cases, filling out a paperwork ticket to fix these problems may be as intrusive as nuisance alarms. Many facilities use a computerized maintenance management system for tracking field assets such as transmitters, motors, pumps and valves. Process automation alarm-management configuration should not be an exception. If nuisance alarms or alarm statistics sound off, the front line of defense, your operators or operations supervisors, should be able to create a work order immediately, saving time and increasing the speed of resolution.

Alarm rationalization data, operating procedures: During the alarm-management-strategy execution, process-control tag data were collected and archived to improve the integrity of plant alarms. These data included tag descriptions, locations, tuning parameters, alarm configuration parameters, the alarms probable cause, effect, and recommended action information. These data could be a valuable resource to operations during critical conditions; they should be operator assistance information and should be immediately available. In addition, links to current, up-to-date procedures can also help in the decision-making process. Most incident reports indicate operations personnel didnt know how to react to certain plant conditions or lacked access to proper procedure documentation.

Change management: During the alarm rationalization steps taken, the database contains key parameter information. In some cases, this database has features and reporting that enable it to work for change management purposes. For example, using these data to compare against current automation-system settings could provide a difference report that indicates that values have changed. Audit trails and reports from both the change management database/system and the automation system can really speed up decision making as to what values are set incorrectly, who set them, and why. This information, if accurate, and quickly available, can help avert costly mistakes and incidents reducing risk to personnel and saving millions of dollars in damage and lost production.

While it is hard enough to execute continuous plant improvement strategies such as alarm management, its even harder to sustain them. Various plant personnel and plant systems must mesh and integrate to streamline the necessary activities required to continue the effort. Its easy to pound in a nail if you have a hammer and the same holds true with accomplishing your integration goals. Some automation systems utilize platforms based on standard technologies that can help ease the integration effort to provide the best in class, maintainable solutions. The end user must make sure their automation system investments, whether new, replacements, or upgrades, provide the necessary infrastructure that reaches beyond the traditional control system. This is paramount when considering continuous improvement programs that involve the entire plant. A solid technical foundation, combined with clear communication with your plant team, will ensure your program's ongoing success.

Roy Tanner is a marketing manager for ABB in Wickliffe, Ohio. E-mail him at [email protected].
Rob Turner is a senior consultant for ABB Engineering Services in the UK. Email him at [email protected].
Jeff Gould is a vice president at Matrikon in Edmonton, Alberta Canada. Email him at [email protected].