“You must learn from the mistakes of others. You can't possibly live long enough to make them all yourself,” to quote Sam Levenson. When it comes to alarm management, Levenson is correct.
Ineffective alarm systems pose a serious risk to safety, the environment, and plant profitability. Too often, alarm system effectiveness is unknowingly undermined by poorly-configured alarms. Static alarm settings can’t adapt to dynamic plant conditions. A flood of nuisance alarms overwhelm operators just when they most need concise direction (Figure 1). Operators and engineers in the chemical industry have become increasingly aware of the value of alarm management systems. If set up properly, they can identify abnormal situations, allowing operators to move the process back to a safe condition.
Figure 1. Typical stressful day in a control room
As alarm management solutions become more common, our understanding of the factors that impart success has grown. If you’re thinking of undertaking an alarm management solution, or if you have already started one, the following information based on lessons learned, can help drive your project to success.
Establishing a reliable system
The process for founding a successful alarm management system is fundamentally the same across industries, regardless of plant size. As with continuous improvement it is an ongoing, dynamic process:
- Benchmark and evaluate current performance: This is the time to identify problems with your alarm system. Some results can be surprising. Figure 2 clearly shows a high frequency of emergency alarms that should be investigated. The Engineering Equipment & Materials Users’ Association (EEMUA) suggests that by dividing your alarms into three categories: emergency, high, and low, the optimum proportions should be a ratio of 5:15:80.
- Develop an alarm philosophy document: This critical document clearly outlines key concepts and governing rules for your alarm strategy. It should answer such questions as: What constitutes a critical alarm? Which alarms should cause trips? Which alarms should be allowed to be inhibited — under what conditions and who should have authority? Which alarms are advisory? The philosophy also outlines roles and responsibilities, change-management procedures, and the project goals, such as target alarm rates. There is good news for those who find it difficult to compile the philosophy document. Templates are available that do most of the work for you — all you are required to do is include your specific metrics and situation.
- Rationalize alarms: First, target and eliminate the top 20 to 30 bad actors to significantly improve alarm loading. Find out which alarms occur most often. Determine which alarms are significant (Figure 3). A complete review of the process operation related to when alarms are active is necessary. After this is completed, a review should be held with operations to ensure that alarm priorities convey consistent urgency to the operator. In its final form, a configuration review document should be prepared — ready for the next step.
- Implement changes: Control system re-configuration makes the intentions of alarm rationalization a reality by eliminating nuisance alarms at their source. Part of this step includes integrating maintenance of the alarm manager into the plant workflow. Integration includes updating operating procedures and record-keeping. Maintenance must involve a documentation procedure that feeds continuous improvement.
- Strive for continuous improvement: Hold routine review sessions to identify new opportunities for enhancements, such as dynamic alarm strategies.
Figure 3. Alarm-rationalization includes frequency plot of tag names
Now that we have defined the correct execution path (Figure 4), let’s take a look at the recent lessons learned by industry:
Figure 4. The step-by-step process for rationalizing alarms
Blunder 1: poor project management. Poor planning, sketchy system design, inadequate resource allocation, incomplete scheduling, and ineffective management of operator expectations can destroy the success of any project; alarm management is no exception. The single most important activity is planning — detailed, systematic, team-involved plans are the foundation for project success.
Blunder 2: using the wrong tools. Delivering the optimum return depends on selecting the right platform for achieving alarms and events. Selecting the proper alarm analysis tool is also critical.
The archive system and analysis tool will assure that you are chasing down major problems not being bogged down in nuisance alarms.
Beyond simple analysis, tools that enable automatic change control, punch-list generation, and project tracking are available. Give forethought to how leveraging alarm information will be achieved once this knowledge is in a repository. Although these tasks can be performed without special software tools, it isn’t practical to do so. The effort often becomes so daunting that alarm management initiatives can collapse under the weight of their own logistics. It is best to do away with paper trails in the form of spreadsheets and change control documents posing as Master Alarm Databases. Use the right tools.
Blunder 3: skipping benchmarking. Benchmarking is vital to any serious improvement initiative. If you don’t measure your current performance, you won’t be able to accurately determine your progress. The first step is to keep track of alarm rates for several weeks in order to get a baseline measurement. Once that’s done, assess how your plant’s current alarm levels measure up to industry standards (Figure 2).To get a quick snapshot of where your plant ranks according to 191:99 “Alarm Systems — A Guide to Design, Management and Procurement” by the EEMUA, Matrikon has posted an automated calculator on its website (www.matrikon.com/plantperformance).
When you have finished benchmarking, you can start identifying opportunities for improvement. Following are the key questions in order of importance you need to answer when performing this assessment.
- Is the dynamic (real-time) alarm load acceptable for all operators?
- Does the dynamic alarm prioritization meet industry standards?
- What are the troublesome tags on the system during steady-state operation?
How about during start-up and shutdown? Was there any particularly
troublesome phase of your operation that lit the board?
- How does the configured-DCS alarm count compare to the EEMUA standards?
(How many alarms per tag during start-up, shutdown, steady-state?)
- What does the configured-alarm distribution look like compared to the
Blunder 4: stop philosophizing — get it done! Failing to establish and document best practices is a recipe for disaster — you need to create guidelines for alarm rationalization. Documentation should include: rules for setting alarms, plans for alarm reviews to build commitment and consolidate training, and an audit process to ensure that the rules are consistently applied. These guidelines will clearly define the criteria for legitimate alarms and setting of their priorities. They are the backbone of an “alarm philosophy” document, which acts as a corporate standard to guide your entire organization’s alarm management initiatives.
Blunder 5: cutting corners. Disturbingly, companies often exclude their best resource — the panel operator — from rationalization meetings. Panel operators are the end user and the primary stakeholder in alarm optimization. If you exclude the panel operator from the rationalization process, your project will fail.
The following reality is based on unpleasant site experience. Instrument technicians, automation engineers, process engineers, and field operators are not panel operators. Sometimes, foremen, who were once panel operators, can serve in their place, but their experience may not be current.
Please pay attention: the best candidate for the meeting is the person who fights alarms and unit problems day-in and day-out. This knowledge is irreplaceable during the rationalization process.
Alarm rationalization is the process of applying operational experience to alarm system design.
Although operators are the most important participants in this process, they cannot carry this burden alone. Without a facilitator, who is familiar with alarm rationalization, your rationalization project will take longer than it should, yield poor results, and have to be repeated.
Finally, alarm rationalization requires an engineering review prior to implementation. This is necessary to ensure results are consistent with Hazard and Operability Studies (HAZOP) and Safety Integrity Level (SIL) studies. The “process,” “unit,” or “contact” engineer plays this role. In addition, if this project is part and parcel of the roll-out of a new process, include “technical experts” in your review process. These experts may be equipment vendors. As with all projects, a safety review is recommended after the project is completed to identify any unplanned changes that typically occur in the heat of battle.
Blunder 6: find a system that works with your alarm philosophy. Collecting alarm data in an optimal fashion is system-specific. The easiest way is often not the best way. Be sure to answer the following questions:
- Does the analysis package need to present information to the operator in real-time or are existing alarm visualization tools adequate to manage plant upsets?
- Is the plant hierarchy represented consistently and intuitively within the control system and the alarm management system?
- Is collection of redundant alarm data required to meet regulatory or corporate policy compliance?
- Are all required events such as “Return to Normal,” “Operator Actions,” and “System Messages” included in the chosen connection method?
- Are all required fields available in the data? Can priorities be distinguished? Can audible and suppressed alarms be distinguished?
- Can set-point changes be discerned from output changes?
- Can absolute alarms be separated from deviation alarms? If gaps exist, what other sub-system(s) can be referenced to close them?
- Are archiving of alarms and events and analysis adequate to meet objectives, or do I need to establish a connection with the control system configuration database?
- How flexible is the connection strategy? Will it survive control system upgrades?
- How much maintenance is required to keep the system running (reliability)?
- Does one option provide advantages over another and vice-versa? Should more than one connection be used for each area?
- Do I only want to view these data at the plant level, or would corporate comparisons between sites benefit my operations?
Don’t rely on legacy strategies if they do not meet current needs. What worked in the past may no longer be the best solution. However, don’t make things unnecessarily complex. Decide what you want to accomplish and then choose the simplest method that meets all of your needs. If the collection strategy becomes overly complex then it will be hard to maintain, and ultimately your entire alarm management strategy will suffer.
Blunder 7: failure to automate. Good technology makes life easier. Its purpose is to relieve people of dangerous, repetitive tasks, freeing them to intervene when the automated system requires guidance. When intervention is needed, software should make problem assessment and diagnosis easy — freeing the user to fix the problem.
Although task accountability is necessary for successful alarm management, staff is more likely to use reliable technologies that are available on demand to make their jobs easier.
Blunder 8: are your operators acting on alarms? People often mistakenly fail to track all of the data required. Tracking alarms isn’t enough! Alarm rationalization mandates more than one type of data.
For example, when an alarm occurs you need to know if an operator actually responded to it.
Tracking operator actions is an effective way to identify control problems, automation opportunities, and audit the effectiveness of your alarm strategy. If the operator didn’t respond, there is a good chance that the alarm is a nuisance alarm. Examine the ratio of operator actions to audible process alarms in order to identify poor alarm strategies. The de facto standard “every alarm requires operator intervention” demands this ratio exceed one.
Other data to track consist of operator actions, including controller set-point, mode changes, and system errors. If a controller’s mode, or output, is repeatedly changed it is a clear sign the loop needs fixing. The loop is over-dampened or under-dampened. If action data are coupled with controller performance data, an understanding of the loop’s problems can be quickly diagnosed, saving time. If a controller’s set-point is frequently changed and the controller has no supervisory control, then the automation engineer must ask “why not?” Installing new automation strategies can free the operator to focus on pushing limits rather than maintaining process stability. In addition, process variable history is important for determining some deadband alarm settings, or for performing the engineering reviews prior to implementation.
Blunder 9: treating all data the same. Audible alarms are not the same as non-audible alarms. Many control systems continue to send alarms to the journals when alarms are not audible. Failure to separate these data creates an inaccurate picture of alarm system performance and may lead personnel to think the situation is worse than it is. Moreover, this may waste time by falsely indicating alarm problems.
Blunder 10: who reads user manuals? I confess to not reading my motherboard manual the last time I bought a computer. Nor did I read the instructions for my television, DVD player, microwave, and certainly not the 1,800 page operating-system help files. I know you’re guilty, too. The easiest way to undermine effective alarm management is to implement a solution without giving personnel the hands-on training they need. This point is perhaps best illustrated with a real-world example:
A large petrochemical plant went to great efforts to improve its alarm system performance through alarm rationalization. Once the new settings were designed, changes were uploaded to the control system over the span of two months. Training was provided throughout this period. Joe, a veteran operator with 21 years of experience, was entitled to five weeks of vacation per year. Shift rotations at the company normally consisted of four weeks on and one week off. Joe had recently earned some time-in-lieu by working some shifts for a co-worker.
With these factors combined, Joe decided to take two months off. Guess when? On Joe’s first day back, there was a compressor trip. This caused a single emergency priority alarm to be sent to the control system. Joe was accustomed to assessing the plant’s state based on the rate of alarms. He naturally assumed things were running quite smoothly: he had only a single alarm in nearly 30 minutes! His delayed intervention escalated the upset to an unnecessary plant shutdown. Effective operator training ensures that operators know what needs to be done, when, and how. Remember team-involved plans are the only foundation for project success. If a plant is unable to provide effective in-house operator training, call upon companies that specialize in third party training.
Blunder 11: overhauling the whole system at once. Implementation should be staged. If all changes happen at once, plans become complicated and it never gets done. Recognizing this prior to rationalization will help personnel break the execution into easy steps. This enables operations to become accustomed to the changes gradually, thus improving the chances of success.
Blunder 12: no accountability. Failing to assign roles and responsibility is the most common — and most deadly oversight in an alarm management project. I advocate resolving this by encouraging “accountability through visibility.” In other words, make sure all staff have access to their peers’ data. This will motivate your plant personnel to work together and prove they run the “tightest ship.” Some sites may make excuses and complain, but in the end they will improve plant operations to avoid repeated corporate humiliation. This sounds harsh, but it works.
It’s best to define maintenance tasks and assign responsibility early in the project, such as during the project plan design. This must be done in a simple manner, both textually and in actual day-to-day practice, to ensure the sustained support of the idea. This will give personnel an opportunity to participate during installation and validation of the system; they will “own” the new system.
The final word
Alarm management solutions can significantly improve plant safety, reliability, and profitability, but will only succeed if they are implemented properly. If you follow the recommended project methodology and avoid the mistakes we described, you should be successful. An efficient alarm management system will make your personnel more productive and improve the reliability of your plant.
For additional alarm management resources, view Matrikon’s online and interactive multimedia presentation at http://www.matrikon.com/am_tutorial/.
Michael Marvan, is a product manager for alarm management solutions with Matrikon in Edmonton, AB, Canada; email him at Mik.Marvan@matrikon.com.