In the real world, some owner/operators essentially are following the old adage: “Measure with a caliper. Mark with a scribe. Cut with a chain saw.” Their process hazards analysis is becoming increasingly quantitative with more factors and modifiers, and the verification of risk reduction uses multiple significant digits — yet the mechanical integrity record simply states “failed.”
The real world must come into balance because mechanical integrity data prove the risk reduction strategy. The risk reduction provided by a piece of equipment is the inverse of its probability of failure on demand (PFD), which is the number of times the ISS has failed dangerously divided by the total number of times the ISS has been challenged. Using probabilistic techniques, the PFDs of specific equipment can be calculated and compared to expectations7.
The most important things cannot be measured1. Consequently, PSM requires that quality be built into the design and management system. Validation and periodic proof testing demonstrate that the quality system is rigorous enough to exceed the required equipment integrity. Maintenance plans should consider how degraded equipment operation will be detected early, so it can be corrected before the equipment fails. Safety equipment must not be run to failure.
The more that’s known about the equipment and what’s affecting its operation, the better the risk can be managed. For safety systems, the most important thing is knowledge that the equipment will operate as required when called upon. The quality of the installed equipment is limited by the rigor, timeliness and repeatability of mechanical integrity activities as well as by wear-out and degradation.
To gain confidence in the equipment, perform periodic inspection and preventive maintenance to maintain it in “as good as new” condition. Proof tests provide an auditable means to demonstrate proper operation. Near-miss and incident investigations should evaluate any identified ISS inadequacy or failure. Track spurious trips and process demands and compare them with expectations from the hazard analysis. The Check phase involves monitoring equipment records and looking for trends indicating design or management gaps that need to be closed.
Failure tracking is essential to close the safety lifecycle. Repeated failures likely indicate that the installed equipment isn’t capable of meeting performance requirements. Use root cause analysis to determine why metrics are trending in the wrong direction, then implement action plans to improve the management system, equipment, procedures and personnel training. Identify and communicate to personnel special and previously unknown causes of failure — to ensure that lessons learned aren’t hidden in mechanical integrity records. Use MOC processes to resolve performance gaps.
“What is a system? A system is a network of interdependent components that work together to try to accomplish the aim of the system. A system must have an aim. Without an aim, there is no system. The aim of the system must be clear to everyone in the system. The aim must include plans for the future. The aim is a value judgment4.
Even when good people apply adequate theory and standards, there’re always lessons to be learned. The Act phase involves the actions taken in response to trends in metrics and continuous improvement opportunities. If an owner/operator’s safety culture shines here, risk will be driven as low as reasonably practicable.
Continuous improvement is incorporated in PSM through a concept often called “grandfathering,” where the owner/operator determines and documents that the existing equipment is designed, maintained, inspected, tested and operated in a safe manner. An assessment of the existing safety system should demonstrate that the design and management practices meet or exceed the intent of current good engineering practices and process requirements. Don’t hide outdated or under-performing equipment under the cloak of grandfathering.