The functional safety lifecycle covers a safety instrumented system (SIS) from concept to retirement. While important activities occur in each phase of the lifecycle, operation phase activities stand out because they are performed repetitively and are critical to long term reliability.
An SIS is a high reliability system comprised of sensors, logic solver(s) and final elements. It includes a number of safety instrumented functions (SIFs), each designed to provide a specified risk reduction. The necessary risk reduction is assigned as a safety integrity level (SIL) that establishes the reliability requirements for the SIF. A clear understanding of the failure rate, failure mode and failure effects for devices as well as implementation of a management program to effectively identify and correct failures on a routine basis are essential for achieving the needed reliability.
SISs operate in one of three modes: continuous, high demand or low demand. In low-demand systems, proof testing is an effective tool because the SIF components generally are dormant for long periods of time — which provides the opportunity to detect and repair failures and then return the component to service between demands.
In this article, we’ll review failure rate and failure mode basics, discuss proof test frequency and effectiveness, consider the robustness of the maintenance program, identify information to be collected during a proof test, and provide tips for analyzing the data to ensure continued reliability.
Device Failure Basics
You must overcome three hurdles to achieve the SIL target: probability of failure (PFDavg. in low demand), hardware fault tolerance (HFT) and systematic capability (SC). The failure rate, failure mode and failure effects of the SIF components influence all three hurdles. Reliability theory is based on the premise that components are replaced at the end of their useful life before wear-out affects failure rate. A common mistake in the operating phase is overestimating the useful life of devices. For example, solenoids have a useful life of 3–5 years and should be routinely replaced during refurbishment. Valves can have a useful life as short as 3–10 years — or less if improperly specified, installed in severe service applications or not maintained correctly.
Devices are classified as Type A or B. Type A devices generally are mechanical and usually fail in a more predictable manner. Examples include valves, actuators, solenoids and relays. Type B devices are primarily intelligent and electronic — therefore, they can fail unpredictably. HFT requirements are increased for Type B devices to compensate.
Failures may be random or systematic. Systematic failures stem from design or manufacturing procedures or personnel competency — and can be reduced or eliminated. For certified devices, SC is determined by assessing the ability to control or avoid failures associated with the design and manufacturing process. A certificate will list the SC limits of a device for a specific HFT based on the assessment. Non-certified devices require the reduction of random and systematic failures through proven in use (prior use) data collection and analysis.
Overall device failure rate will include both random and systematic failures. Failure mode is either safe, dangerous or no effect. Failure rates are designated by λ, using subscripts to indicate safe (S) or dangerous (D), and detected (D) or undetected (U). For example, a safe/detected failure would be identified as λSD. Diagnostics can spot some dangerous failures, λDD. The goal of proof testing is to identify dangerous undetected failures, λDU, and repair them in a timely manner. Proof test coverage (CPT), neglecting diagnostics, is the percentage of λDU failures that the proof test can identify : CPT = (λDU revealed during test)/(λDU total).
Proof Test And Diagnostics
IEC 61511  defines low demand as a “mode of operation where the SIF is only performed on demand, in order to transfer the process into a specified safe state, and where the frequency of demands is no greater than one per year.” When the demand frequency exceeds twice the proof test interval, a SIF should be treated as high demand and the benefits of proof testing no longer are realized . Demand rate is fixed based on the frequency of failures that could initiate a trip. As organizations seek to lengthen the time between turnarounds where offline proof tests can be performed, SIFs can shift from low-demand to high-demand mode. Extending a turnaround interval thus necessitates combining diagnostics, online proof testing and offline proof testing to maximize SIF reliability.
Automatic diagnostics continuously monitor the health of SIF components while SIF protection is in place. They enable identifying some failures immediately, allowing timely repair or replacement. The partial diagnostic credit (PDC) for automatic self-diagnostics depends on the ratio of the diagnostic and demand rates. For example, a ratio of 100× can provide 99% PDC while a ratio of 10× gives 95% PDC . In low-demand systems, repair capability limits diagnostic benefit. Administrative procedures must set a timeline (typically 24–72 hours) to remove the affected device from service, repair or replace, and return to service. Diagnostics most commonly are available for Type B devices such as transmitters; they may be an additional cost option that must be specified prior to purchase. Actuation of a device during normal operation also provides diagnostic value but isn’t considered a proof test. System design must include isolation and bypass capability to permit making repairs. Diagnostic coverage is set based on a combination of these factors.
Online proof testing provides some diagnostic benefit. However, the test is performed at a lower frequency than diagnostics, and SIF protection is disabled during the test. An example is partial valve stroke testing (PVST), which is a useful tool where the process can tolerate partial valve stroking without initiating a trip. Typically, an online test will identify only a subset of the failures that a full stroke offline test can detect. Proof test coverage is determined based on the percentage of λDU failures the PVST can identify. Online proof testing may take place as often as practicable while a unit is in operation. As with diagnostics, system design must provide isolation and bypass capability to permit timely repairs.