Many plants rely on safety instrumented systems (SIS) to address hazards. Determining whether such instrumented protection is needed and, if so, the appropriate Safety Integrity Level (SIL) is crucial for achieving the required level of safety. A wide variety of methods are available for SIL determination. International standards IEC 61508 [1] and IEC 61511 [2,3] provide a selection. These are found within the "informative" sections of the standards and therefore aren't mandatory — you may choose one of those methods or a suitable alternative approach from elsewhere.
Safety Function Context
Figure 1. A safety function protects against a specific hazardous event.However, certain issues and problems are common to the application of all methods. This article discusses some relevant issues, with a focus on implications of SIL 3. The Hazardous EventThe starting point within the process is identifying the potential hazardous event — what can happen should the safety function fail to operate correctly on demand (see Figure 1). Here, we're interested in assessing the benefit derived from the safety function. For SIL determination, the consequence can be regarded as the difference in outcome between the safety function working and not working. Normally, the consequence cost associated with the safety function working will be small compared with it not working. Occasionally, however, the safety function operating as intended can incur a significant cost: lost production or dumped product. This is also the case for "spurious tripping." Such consequences must be considered when assessing the benefit from the safety function — and reflected in the SIL. The difference between the function working and not working is its true benefit in risk reduction. Safety Lifecycle [2,3]
Figure 2. Efforts extend from design to decommissioning. During the hazard and risk assessment phase of the safety lifecycle (Figure 2), it's important to identify all significant hazardous events. For each of these, an assessment is made as to how much risk reduction is required from a safety instrumented function (SIF) to achieve a target level of risk (Figure 3). The required risk reduction or performance for the SIF is expressed in a SIL. This assessment is often called SIL determination. SIL determination results in defining required performance or "target SIL" for the SIF. Some methods of SIL determination also can define a target Average Probability of Failure on Demand, PFDavg, which represents maximum value permitted within the range covered by the target SIL (Table 1). Principal methods described in Refs. 1-3 for assigning a target SIL or PFDavg to a SIF are: • Safety Layer Matrix (SLM);• Risk Graphs (RG);• Layer of Protection Analysis (LOPA);• Fault Tree Analysis (FTA); and• Event Tree Analysis (ETA). Each has its strengths. For instance, SLM is simplest while FTA and ETA both are highly flexible and therefore suit more complex situations. However, all methods — even the seemingly most straightforward ones — highly rely on the competence and experience of the user. It's generally reckoned that SLM and RG are suitable for initial screening assessments. Any SIF with a target of SIL 2 or above would require re–assessment using a more detailed and flexible method such as FTA. Risk Reduction
Figure 3. Protection measures should decrease risk to below the target. The standards provide some examples of the methods. These only demonstrate the approach — you shouldn't, for instance, simply use the risk graph shown in the IEC 61508 standard. In a real situation you must create a specific risk graph that reflects target risk criteria for the site and application in question. This often is referred to as "calibration." Without such calibration, use of the risk graph is erroneous. Similar cautions relate to the SLM. Key issuesA number of issues need to be addressed during SIL determination. Let's focus on three that are easily overlooked by those new to the process. The first is to identify and include all failures that could place a demand on a SIF not just obvious ones — some relate to normal operation and others to start-up and shutdown. The standards insist that we "ensure that the SIS safety requirements are achieved for all relevant modes of the process," including maintenance, process upset and emergency shutdown [2]. Without this, the SIL determination effort may seriously underestimate frequency of demands on instrumented protection and therefore indicate a lower SIL than really required. This could raise consequent risk level by an order of magnitude. The second issue is acknowledgement that people when interacting with SIS can make mistakes. Not all tasks will have the desired outcome; perfection isn't attainable. Likelihood of human failure and its impact on risk must be factored into assessment calculations both for determining target SIL and calculating achieved SIL. This is particularly important for safety functions designed to SIL 2 and above, where, without careful consideration, the human component may totally dominate unreliability. For further information on human factors and risk assessment, see Ref. 4. SIL Performance | |
Safety Integrity Level | Average Probability of Failure on Demand, PFDAvg |
SIL 1 | 0.1–0.01 |
SIL 2 | 0.01–0.001 |
SIL 3 | 0.001–0.0001 |
SIL 4 | 0.0001–0.00001 |
Table 1. Most process plants require a SIL no greater than 2. |
High Integrity SIF
Figure 4. Need for a SIL-3 safety function is rare at process plants.Proper SIL DeterminationWorking out what the demand frequency may be is always going to be difficult. Clearly, the more frequent the demands, the greater the awareness of the plant operators as to how often demands occur. It's important to include all potential sources of demand in any SIL determination. To do this, it's vital to have a systematic approach that covers normal operation, abnormal operation, start-up, shutdown and demands initiated from outside the plant (loss of services, power, etc.). Many of these may be infrequent but, when added together, they become significant. Demand trees are a good way of being systematic. Estimating infrequent demands is difficult. This is especially so when the interval between demands is more than about 10 years. There may be no member of staff who has worked at the plant for that length of time. Three aspects of SIL determination deserve special mention: team competencies, alarms and personnel exposure. Team competencies. Effective SIL determination requires input from many disciplines. It's certainly not something for the instrument engineer to do as a solo exercise! It can be done in a similarly to a hazard and operability study — via meetings with a leader (preferably a trained safety and reliability professional) and appropriate representatives of other relevant disciplines. These should include instrument and control engineers, process engineers and plant operations staff (preferably actual plant operators). It's essential to choose people for such meetings carefully, to ensure all relevant disciplines are present and the group will work well together. Sometimes the presence of more senior management can inhibit the discussion of what really happens at the plant. Such meetings can work well for initial screening purposes and may provide sufficient detail to justify SIL-1 safety functions. For the higher SIL, more detail would be appropriate and, for this, it's better to appoint someone independent of the design team to carry out the assessment. Alarms. SIL determinations often must consider potential risk reduction from operator response to alarms. However, there's a tendency to do this without sufficient thought! There's a need to ask questions such as: will the operator be available to respond? There may be insufficient time to respond. There may be too many other alarms at the time. It may be difficult for the operator to decide what to do. How do you know the operator will take the correct action on initiation of an alarm? Is there a clear well-defined documented response for each critical alarm? Is the means of response still available to the operator? If an alarm has occurred because a control valve has stuck open then an operator response "close control valve" using the same valve isn't going to be effective! There's a need to think through the scenario and decide what the operator should do — and then ensure that all operators are aware of the appropriate action. Personnel exposure. When we're considering potential consequences of failure of a trip, we often need to look at the proportion of time that the person most at risk may be in the vicinity of the part of the plant where he or she could be injured. Usually, for a high hazard area this intentionally will be quite small — less than 10% of the working day. However, before claiming benefit from this, it's important to consider whether the person is likely to be asked to go to the hazard area to investigate just when the incident may occur. In that case, the probability of the person being there is nearer to 100% than below 10%.Are Results Right?At process plants, most SIF won't require higher than SIL 1. For safety functions requiring SIL 2 and above, there are questions to address: do you have the correct formula for your reliability calculation? Have you considered common cause failure? Do you have a method for selecting appropriate values for common cause factors? How do you allow for physical blockages of connections to the plant process? Do you account for power supply failures, cabling and instrument manifold piping? Have you included contributions from human error in your calculation of PFDavg?SIL-3 safety functions normally are very rare in the process industries. So, if you've identified that you have one or more high integrity SIF, maybe similar to the function illustrated in Figure 4, with a target SIL 3, this requires special attention. The first question to ask is whether the target SIL is correct. While SIL 3 eventually may be found to be right, it's always worth considering whether the assessment included all relevant factors. Furthermore, you should check whether the methodology was suitable — RG and SLM aren't considered appropriate for SIL 3 and even LOPA normally is limited to SIL 2. Review of the assessment with a well-constructed fault tree with inclusion of relevant additional factors can result in a SIL-3 requirement for the SIF being reassigned a target PFDavg in the range for SIL 1. This reassignment may reduce both capital and operating costs. If the review indicates that a SIL-3 SIF is necessary, then there's a need to look very carefully at hardware configuration and human interactions with the safety function. Achieving SIL-3 performance and maintaining it for the lifetime of the function isn't by any means a straightforward task. SIL 3 impacts capital costs because it requires a high degree of duplication — more than one sensor and more than one means of output (see Figure 4 again) to ensure that the function will continue to perform if one or more failures occur between necessary periodic tests. This requirement for continued working in the face of one or more faults is described in international standards as "hardware fault tolerance." Achieving the necessary PFDavg in the range 0.001 to 0.0001 (Table 1) means the SIF must be managed so that it can respond successfully to a demand on it for all but a few hours (8.7 hours or less) per year — and this must include the time when the organization is unaware that the function isn't working. SIL 3 incurs additional operating costs. Proof testing a SIL-3 function takes significantly longer than for a SIL-1 function because it has more elements to test and prove proper functioning and has greater complexity. Additionally, proof testing must be done more frequently — almost certainly at least once a year. Any safety programmable logic controller suitable for application at SIL 3 will rely very heavily on diagnostic features to detect internal faults that develop and yet the diagnostics normally can't be tested for full and correct functionality by the end user. When calculating the PFDavg and demonstrating that SIL 3 is achieved, great care should be taken to ensure: (a) the failure rates used are direct field-failure ones that are applicable to the situation; (b) an appropriate assessment of dependency is included — otherwise, calculations will be grossly optimistic; and (c) the unavailability of the function during testing is accounted for. It's also important to consider the human interactions with the safety function. SIF require maintenance, calibration and testing. All of these involve people. Humans aren't 100% perfect in all they do. You can't just say to a technician, "when you work on this SIL-3 function please be 100 times more careful than you are on the SIL-1 functions." People try to take care in what they do all the time; it's impossible to do the same calibration task on a SIL-3 function with 100 times more care than the same task on a SIL-1 function. Nevertheless, it's important to consider likelihood of human failure and its impact on a SIL-3 safety function. The probability of human error (such as a probability of 0.003) that on a SIL-1 function may have relatively little effect on the PFDavg may make SIL 3 utterly unachievable. This means that design of the tasks and conditions under which they are planned to be carried out on a SIL-3 function must differ significantly from what would be reasonable for a SIL-1 function. Redesign of tasks and assessment of appropriate human-error probability and its inclusion in the PFDavg calculation, while vital at SIL 3, aren't easy and require specialist skills. A Safe ApproachIEC 61508 and IEC 61511 represent current good practice in the management of SIF at process plants across the world. It's important for those managing the operation of such plants to be aware not only of these standards and what they contain but also of the benefits that adopting them can bring in demonstrating suitable management of risks. Indeed, unless operators of high-hazard process plants adopt such standards, they are open to the criticism that they don't adhere to current good practice for management of instrumented protective measures and can't demonstrate that operating risks are being properly managed to an appropriate level. SIL determination requires care. Any prospective SIL-3 SIF demands reassessment. Process plants rarely require SIF with that high a level. Those claiming the need for SIL-3 SIF face a major challenge demonstrating that SIL-3 performance is achieved by the combination of hardware and human interactions — one that may not stand up well to close scrutiny, either by company stakeholders or external regulatory authorities. Dr. Alan G. King is a hazard and reliability specialist with ABB Engineering Services, Billingham, U.K. E-mail him at[email protected].References1. "Functional Safety of Electrical, Electronic and Programmable Electronic Systems," IEC 61508, Intl. Electrotechnical Comm., Geneva, Switz. (1998, 2000). 2. "Functional Safety: Safety Instrumented Systems for the Process Industry Sector," IEC 61511, Intl. Electrotechnical Comm., Geneva, Switz. (2003). 3. "Functional Safety: Safety Instrumented Systems for the Process Industry Sector," ANSI/ISA-84.00.01-2004 (IEC 61511 MOD), Intl. Soc. of Automation, Research Triangle Park, N.C. (2004). 4. King, Alan G., "Inclusion of Human Failure in Risk Assessment," Proceedings, 12th Intl. Loss Prevent. Symp. (Edinburgh, U.K., May 2007), I.Chem.E. Symp. Ser. No. 153, Institution of Chem. Eng's., Rugby, U.K. (2007). 5. King, Alan G., "SIL Determination: Common Cause Concerns," presented at SIL Determination: Principles and Practical Experience seminar (London, Mar. 2006), Institution of Eng. And Tech., Stevenage, U.K.