Accident investigation is a regulatory requirement under OSHA 1910.119(m). Such an investigation requires careful analysis of evidence to arrive at probable cause(s) of the event and develop appropriate safeguards to prevent its recurrence. Ideally, investigators would have ample time to analyze all evidence. In reality, plant personnel (and management) almost always feel tacit pressure to resume operations as soon as practicable. Although this pressure is understandable from a business economics view, hastily performed accident investigations could fail to catch the real culprit, the probable cause(s) of the accident. Critical evidence inadvertently could get destroyed, leading the investigators to rely on invalid assumptions. The same accident could recur again and again.
To get a plant up and running as soon as practicable while performing an effective investigation depends upon analyzing the accident and developing safeguards carefully and efficiently. Safety professionals, plant engineers, operators, maintenance professionals and management play a vital collective role in developing a framework for efficient accident investigations.
At the strategic level, engineering and administrative controls as well as safety culture jointly contribute to an efficient and reliable accident investigation. The key to ensuring efficient accident investigation is preparedness, i.e., having systems in place to deal with accidents. To improve your preparedness and, thus, the efficiency of your accident investigations, pay particular attention to five factors:
• Fault-tolerant systems;
• Data management to aid investigations/troubleshooting;
• An in-plant accident investigation team (AIT);
• Key spare components on standby; and
• Safety culture in middle and top management.
Briefly put, a fault-tolerant design focuses on preventing an accident or minimizing its impact.
At the design stage, process hazard analysis is helpful in identifying hazards and then minimizing their impact and occurrence. One important tool to address these hazards is fault-tolerant design. Simply put, it uses systems to blunt the adverse impact of an accident. Some examples of fault-tolerant systems include double-wall pipes or sumps with annular space monitoring, dikes, redundant instrumentation (e.g., dual-level transmitters on storage tanks), equipment separation, building locations in safe zones, and plant siting away from sensitive areas such as aquifers, rivers, lakes, parks, populated areas or wetlands.
Fault-tolerant systems help you improve efficiency of an accident investigation by decreasing the impact of an accident; this, in turn, reduces the tacit pressure to wrap up the investigation as quickly as possible.
In a broad sense, accident investigation is a vital step for determining measures to take and systems to install to minimize an event’s recurrence and impact in the future. In addition to fault-tolerant designs, multiple layers of protection help thwart accidents. Such layers of protection typically include alarm systems, control systems, relief valves, interlocks, safety instrumented systems, operator training and testing, alarm rationalization, and emergency response systems.
The reliability of an accident investigation depends upon the team having access to critical data. Such data could be destroyed during an accident. So, preserving these data is key.
What are critical data? This depends on the process and equipment. For example, on a distillation column, critical data may include pressures, differential pressures, temperatures, rate of rise of pressures, levels in the bottom and in the reflux drum, reflux flow, relief valve set points and maintenance records, and feed composition. For equipment such as compressors, turbines, transformers and flares, vendors can provide insights about safety critical data.
It’s also important to preserve records of operations, operator logs, instrument calibrations and maintenance history. Develop appropriate and easy-to-use database systems or data historians for these records.
After identifying the safety critical data, put in place systems to ensure the data set will not be destroyed during an accident. Consider the following steps:
• Protecting data transmission from the field to the control room and data management (data historians) from potential hazards such as fires, heavy rains, flooding, dropped objects and electromagnetic or radio frequency interference.
• Arranging safety critical data in a format that’s easy to use for accident investigations and troubleshooting.
• Time-stamping the data.
• Taking measures to ensure data security from cyber attacks.
• Using modular systems that facilitate system expansion.
• Performing system upgrades (to avoid obsolescence of the data management systems).
In addition, consider administrative safeguards to reduce the probability of accidental destruction of evidence. Carefully developed accident investigation procedures should address the issue of isolating the accident scene and preserving evidence. Operator and contractor training is a must to prevent inadvertent data destruction.
The Accident Investigation Team
The time to form an AIT is not after an accident but before one occurs. Obviously, accident investigation is a multi-disciplinary task. The team should include plant subject matter experts and consist of people from operations, engineering, maintenance and safety/environmental management. Team members should be technically competent as well as emotionally mature individuals.
The team should develop an accident investigation procedure for the plant site. In addition, a comprehensive health, safety and environment (HSE) manual for the site is a valuable tool in streamlining accident investigation efforts. The HSE manual would include procedures, for example, lock-out/tag-out, confined space, safety permits, respiratory protection, personal protective equipment (PPE), accident investigation, incident reporting and hazard communication.
Consider taking the following steps:
• Training all affected personnel in responding to accidents. The focus of this training is personnel safety, isolating the accident scene, preserving data and making notes in the operations/maintenance logbooks that would help the AIT.
• Collecting without delay information about the accident scene. The AIT should prepare a site-specific checklist of items that workers involved in an accident can use to document relevant crucial details of the event. The checklist could include, for example, date and time of the incident, unit involved, weather (temperature, humidity, wind velocity and direction), mode of operations and production rate of the unit as well as type of release (vapor, liquid or both), estimated release quantity and whether it’s reportable. Once safe to do so, take onsite samples and photograph evidence. You may consider high-tech as well as low-tech accessories such as digital cameras authorized for use in hazardous locations, sample bottles, handheld area monitors (e.g., for lower explosive limit or toxic gases), PPE and tablet computers.
• After ensuring safety of personnel, interviewing people who were close to or witnessed the accident. The accident investigator should put the interviewee at ease. Stress that the purpose of the interview is to learn from the accident to try to prevent a future recurrence, not to fix blame. However, people may be frightened or traumatized and may not be willing to share vital information during an interview.
• Using software to enhance the efficiency of the accident investigation. Many accident-investigation software programs are available commercially. Make such software an integral part of the plant data management system, not an isolated system. Ensure data transfer from the data management system to the software and vice versa is as seamless as practicable.
• Bringing in outside consultants. Some accidents may require their expertise. The AIT should keep contact information for outside consultants as well as vendors and contractors readily available. The AIT should have unhindered access to funds necessary to summon outside help.
• Establishing a sensible balance. The AIT team members should be pragmatic in considering safety as well as the desire to get the plant back into operation in a reasonable time.
Lack of critical parts that have long delivery times can delay equipment repairs and, thus, impede getting the plant back into operation safely following an accident. Having such critical parts on hand is very helpful in concluding accident investigations. Developing a critical spare parts list requires input from many parties including engineering, production, safety and maintenance. Where funding for expensive spare parts such as gear trains, distillation trays, turbines and compressors can’t be justified, explore partnerships with the equipment vendors.
Spare parts onsite require proper storage to ensure they will be ready for use when needed. For example, igniters for flare systems, if left open in air, may get corroded. Some spares such as pumps and compressors may need periodic testing.
Best-in-class companies have well-developed safety cultures at all levels of management. Top-level management plays a crucial role in establishing the safety culture, organizational philosophy and company conduct (see: “Process Safety Begins in the Board Room”). Policies should be supplemented with deeds, i.e., behavior should create trust in the top management. Workers should feel empowered to report accidents to the best of their knowledge and interpretation, without fear of reprisal. In the wake of an accident, high-level corporate executives should provide moral as well as financial support. Their actions will strongly affect company image. Well-developed safety culture and open dealings with stakeholders (such as workers and neighbors) will help project a positive image within and outside the company.
Mid- or plant-level management develops and implements policies and procedures. So, these people directly and immediately influence the behavior of plant personnel — and should emphasize (and practice) safety along with production goals. Mid-level management must understand that staff judge their commitment to safety by actions, not safety slogans. So, plant managers should show support for safety improvement projects and provide the necessary funding for them.
Because the supervisory level of management is closest to the workers, its day-to-day actions greatly influence staff behavior. At this level, a number of factors (including poor record-keeping, lack of availability of equipment or tools, and untrained personnel) contribute to delays in accident investigations. In some organizations, supervisors get the sense that they must get the plant back into operation as soon as possible and that safety can take a back seat. A well-developed safety culture will help counter this.
Safety professionals should play a lead role in infusing safety culture at all levels of management.
The Bottom Line
Doing accident investigations right requires grappling with engineering, administrative and cultural issues. It’s certainly worth the effort. After all, efficient accident investigations will help ensure long-term safety, productivity and positive public image of a company.
GC SHAH is a senior consultant at Wood Group Mustang, Houston. E-mail him at firstname.lastname@example.org.