In this episode, Trish & Traci prove when the measure becomes a target it ceases to be a good measure.
Traci: Welcome to this edition of "Process Safety with Trish & Traci," the podcast that aims to share insights from past incidents to help avoid future events. I'm Traci Purdum, senior digital editor with Chemical Processing, and as always, I'm joined by Trish Kerin, the director of the IChemE Safety Centre. Happy new year, Trish.
Trish: Happy new year, Traci. How are you doing?
Traci: It's a beautiful day here in Cleveland. How about you?
No worries! Subscribe and listen whenever, wherever.
Trish: Good. Yeah. Doing well. So, we're in the middle of our summer now, so we've got quite a warm week ahead of us. So, getting out and spending a little bit of time in the sun and the warmth is always a nice thing to do. I find it good for the soul to just get out into the open every once in a while.
Traci: Isn't that the truth? Well, speaking of things to do good for the soul, I figured that kicking off the new year right, we could probably look at how to properly focus on process safety. There are many instances where people measure the wrong things and that really causes them to be blindsided by catastrophe. And I know you can rattle off several cases to illustrate this. What are a few incidents that come to mind?
Trish: So, I think probably the most prominent ones that really come to mind for people are things like Deepwater Horizon or Longford in Australia that we had over here, Texas City Refinery incident. These are all incidents where sites thought they were doing well with safety. They had really good personal safety stats at the time, and they thought everything was under control. They weren't hurting people. So, everything's great. But we now know that there were underlying process safety issues. There were latent hazards there that came to the surface and created significant catastrophic events for them. So, I think there's many, many examples that you could think of, but they're probably some of the most prominent ones that we do recall. And you know, the challenge is that it's easy to think that everything is going well if we're not getting feedback that there's a problem. And that's really the nub of the issue with process safety and even OHS. But we need to remember that the absence of an incident does not mean the presence of safety. If you think about it from a logical engineering perspective, the absence of one thing does not infer the presence of another. That doesn't make any sense. That's not logical. But for some reason in safety over the years, we've been conditioned that that's how we measure it. We actually need to take a step back and stop measuring the presence of safety, not the absence of incident.
Traci: And, you know, I've been in factories where they mark their days of people being safe, no injuries, and I think that, you know, the personal injury rates aren't really helping where process safety is concerned. But why is that? Wouldn't that seem to be a logical, "Okay, we're keeping our people safe?" Shouldn't that fan out to a bigger picture?
Trish: There's a couple of reasons why that's not really the case. And the first one I'll tackle is that number on the board at the front of the factory, is it actually real or is it what we want it to be? If you work in a facility and every day you go to work and it says it's been 500 days since the last time injury, you're certainly not going to want to be the next one, which means you may potentially hide the consequences of an injury or the existence of the injury at all or very much downplay the severity of it because you don't want to be the one that ruins the record, especially when we have a lot of situations where people's bonus structures are based on that number. I don't want to be the person, that means you don't get a bonus because that's going to harm our working relationship, that could even also lead to potential bullying issues as well. There are so many reasons why advertising so many hundred days or whatever it is since our last injury can be a really negative thing to do on a facility. It can actually do more harm than good. I'm not saying we shouldn't celebrate the successes. We should celebrate the successes but be really careful how we do them rather than just that blanket, it's been so many days and aren't we great? There are risks to that.
The other issue we've got though is that we need to remember that process safety is not the same as personal safety. The mechanism of causation as we call it is quite different. So, whilst we're worried about loss of control in both, that we're losing control in something, it's the potential loss of energy control that exists in process safety that is so much more significant. We're dealing with much greater forms of energy that we're trying to contain, whether it's product in a pot, whether it is pneumatic energy, whether it's potential energy of dropped object onto a pipeline, for example, or something like that. So, the things that cause a process safety incident are often very different to the things that cause a personal safety incident. The consequences of these two incidents are also quite different. Now, people can be fatally injured in personal safety. That can happen. People can die, and they do die at work in far too great a number. But when we're talking process safety, we're generally not talking one person or even two. We're talking 10, 15, 100, 1,000. We're talking significant numbers of fatalities occurring in one particular incident. So, we're likely to say the consequence has been far more severe. And because the causation of the two sorts of incidents is quite different, the means to prevent them are quite different as well. We need to focus on more on the engineering and design of maintaining control of the system rather than if you think of process safety, it's about protecting the system from the person in some way so that the person can't cause an incident. In personal safety, we're more about protecting the person from the equipment. Whether it's entanglement or burning or cutting of some sort, crushing injuries, those sorts of things.
We're actually looking at different parts of the systems. Now, can good process safety or good personal safety impact the other area? There's an argument to be said that if you've got good process safety processes in place, then you've got a high level of operating discipline and that can give you some advantages in personal safety. Having really good personal safety doesn't necessarily mean you've got an understanding at all though of what the hazards are on the process safety side of things. So, just having great personal safety will not translate to improvements in process safety. And just having good process safety actually doesn't mean you've automatically got personal safety covered. You might have a little bit of an advantage of that operational discipline, but it doesn't automatically have you covered. And so, the important thing to remember is you actually have to manage both in your organization. This is not an either/or game. You've got to manage both. And they both need to have the right amount of resources applied to them to make sure that we can deliver on safety. Not deliver on lack of incidents, deliver on achieving safety. We want the presence of safety here.
Traci: Now, when you want to manage, when you're trying to manage both, what are some of the lead indicators that the plant should be focusing on to help them achieve that?
Trish: There are a whole lot of lead indicators that you could look at both in the personal safety side and the process safety side, and some of them will have some overlap. For example, if you have a process where you monitor the close out of actions that have come out of incidents, so closing actions by their due date is actually an indication of how well the organization is managing its overall management systems. That tells you something about the presence of how the system is running, and that can lead to the presence of safety. Now, if you're looking at the process safety actions closed out, that's a process safety metric. If you're looking at the OHS actions closed out, that's an OHS metric. If you're looking at overall all of them, then it's a very broad metric across the whole system. You probably do want to break it down into different categories because it's possible that more actions are being closed out in one area than another because it can often be a little bit harder to close the process safety ones. Sometimes it requires significant plant alteration and that can be expensive. And so, that can maybe be pushed off to the side and postponed. You really need to understand and delve into some of the detail. There are all sorts of others around, you know, how effective are your permit to work systems in operation. And again, there's both personal and process safety elements in that, but we can look at other ones.
Effective maintenance on safety-critical elements to schedule, are you actually doing the required maintenance so that if your safety-critical devices are needed to prevent an incident, you can have confidence they're going to work? That tells you something about the health of the process safety system. You might miss backlog on process safety items. There's a range of different things. In the Safety Centre, we actually developed a guidance document a few years ago now that identifies and defines a series of lead process safety metrics. [You can access that document here and here is a supplementary guide] Now, this is not the only document and it's certainly not the first document to be written on process safety metrics or indicators. The difference with this document is we actually looked at the previous excellent work that was done and we can jog right back to the CCPS guidance that they've produced on measuring and monitoring in process safety. And then that turned into API 754, which most of the audience will be quite familiar with. There has also been an adoption by IOGP for the upstream businesses in that area as well. They deal with both the lag and the lead metrics. The Safety Centre document, we said, actually, we're only going to focus on the lead and we're going to really define them rather than say that you need to define what works for you. We're actually going to give you some guidelines around that because pretty much...we did a little bit of research and every company we found was actually measuring, for example, maintenance backlog. Whether your process safety maintenance was being done to schedule, but everybody was measuring it slightly differently and everybody called it something slightly different. We're trying to set some sort of consistency in that. So, there's a range of different metrics that we do define across the whole realm of an organization to say, you know, you want to focus on these things in process safety perhaps, and particularly on the leading side.
Traci: And to me, and tell me if I got this wrong, but it seems that lead indicators expose weak spots, where the lag indicators highlight things that are easy to see like the personal injury or whatnot. Am I grasping that properly?
Trish: Yeah. So, the lead indicators actually identify for you where you're starting to see deterioration of your systems. You can start to pick up the deterioration before the incident occurs. Now, having said that, the standard lead metrics or indicators that are out there right now, they are not predictive. The lag metrics are certainly not predictive. They're telling you history. They're telling you what has happened. That's important because we need to know where we've come from to see how we're improving, but the lag indicators are not predictive at all. They don't tell you what's going to happen tomorrow just because it happened yesterday. The lead indicators tell you there is a weakness starting to appear in something. That's also not predictive, but what it does give you a chance to do is to intervene and potentially change the future. So, if you see that one of your systems or processes is starting to degrade and not working as well as it should, you can actually intervene and fix it before the incident happens. So, they're not predictive, but they can prevent incidents into the future. And that's a really important part. But one of the big things to remember when we're talking about measuring safety and we're talking about metrics or indicators is we need to be very careful that we don't get caught focusing on trying to make the numbers look good. And there's a law that I often quote to people and it is, "When a measure becomes a target, it ceases to be a good measure." And this is Goodhart's law. Now, this is about the idea that if you're so focused on the target, you actually end up managing the number, not managing the system. We have to be very careful about that in metrics. It can be so easy to manage the target.
This is where we start to...you know, in a lag sense, we start to debate the volume of that spill. So, it really wasn't a Tier 1, maybe it was a Tier 2. It's amazing how many miraculous calculations are always Tier 2s and not Tier 1s. You know, if we look at it from the OHS perspective, the number of injuries that probably should be registered as lost time, but restricted duty or medical treatment only because we're too busy managing the target rather than saying, you know, "Someone got hurt and they couldn't go to work. Let's not debate whether it's lost time, let's actually fix the issue. Let's manage the system, not the target." And so, Goodhart sort of is always in the back of my mind reminding me, "When a measure becomes a target, it ceases to be a good measure." We need to really keep that in mind when we're dealing with metrics as well.
Traci: You know, managing the target brings to mind that you tune it out. You focus so oddly on something that you tune everything else out, almost like alarm systems. And a prime example, when a car alarm is going off, no one bats an eye anymore because it's just a din in the background. Can you talk a little bit about those sorts of false sense of security and safety measures?
Trish: Yeah. So, alarms is a fascinating area and there's some standards around about, you know, the number of alarms that an operator can manage in any given shift period. And it's important that we don't overwhelm people with the number of alarms that actually exist. And there's all sorts of examples of where things have gone terribly wrong because of this. An incident that I've spoken about a few times was that theme park ride issue in Queensland in Australia. And so, in that incident, the operators actually didn't have alarms to tell them what was going on. They had no indicators to tell them what was going on, except some things that were loosely related but not direct. They didn't have the right indicators. But from a human-factors perspective, which is where we often start to talk about alarms, the operators on that ride had to do 36 individual checks every minute, 36 checks a minute, every minute. The workload, the overwhelming drain on their capability at that point in time was massive. How many alarms are your operators dealing with in their control rooms on a regular basis? There's a couple of other issues with alarms. One of them is this idea of when we do have a lot of alarms that come up, we do start to become unaware of them. It's not deliberate ignoring, but we actually just start to not even register with them. An example here from a process safety perspective is the DuPont Belle, West Virginia, incident that happened back in 2010 in the methyl chloride system. This was investigated by the CSB and what they found was the alarm system on the bursting disc to say that it had burst was so unreliable that the operators just didn't pay any attention to the alarm because it was always false.
What happened in the background was the system was replaced and it was actually replaced with quite a reliable alarm, but the operators saw the alarms as always false. So, they didn't notice that there was a problem because that alarm's always there and it's always false. But in this case, it wasn't false. It was real. And it was actually reliable at that point in time. There are many sorts of examples of different things like that. The other challenge with alarms is that when an incident does occur, you get what's called alarm flood. All of a sudden everything starts to alarm and you then expect the operator to be able to respond and cope. When there are so many alarms going off, most of which are not important enough at that particular crisis point, but somewhere in that sea of alarms, there might be a really important alarm. Can your operator actually respond and find that or is there just too much going on for them? So, they are some of the challenges that we have with alarms, and that's why alarm rationalization is really important. Again, there's a lot of great documentation out there. There are some standards. And in the Safety Centre, we're actually drafting a working group with our companies looking at alarm rationalization at the moment, which we hope to release in a few months time.
Traci: Now, Trish, if you walked into a facility today, what would your advice be on where they place their focus to help them ensure process safety?
Trish: There's so many parts to this answer. You need to really focus on what are your hazards? You need to understand what your hazards are. If you don't know what they are, you can't possibly manage them. So, the risk assessment and identification processes are so important to understand what the issue is you actually need to worry about. Then you've got to make sure you've got systems and processes in place to deal with that. You've got to have people that are knowledgeable and have the competency to be able to operate it. You've got to have the engineering standards to be able to maintain and design and operate your systems. You've got to have assurance systems so that you can check all of these things. You've got to have an understanding of the human-factors elements. I mentioned the alarms as one part of it but human factors is an enormous field. How operable is your facility in the first place? And then lastly, the organizational culture, and that comes down to leadership. And so, these areas, there's a lot of them, and that's the problem. There's no one area to focus on in process safety and go, "So, if we just do that, we're going to fix everything." You know, the chain's only as strong as its weakest link. We've got to have oversight over all of the elements of it, but you can only monitor what matters. You can only focus on the key, critical things. You need to understand what they are. And when you then finally understand what those key, critical ones are, you can actually monitor and manage them adequately. But that requires everybody to be engaged, and that's your safety culture and your organizational culture. And that's also all about leadership.
So, if I had to pick sort of one area that I was really going to focus on, I'd start with leadership because everything else is driven by what the leaders say and do. You know, the old saying of, "What interests my boss, fascinates me." It's actually true. I'm not going to spend my time working in a whole area if it means absolutely nothing to the person that has oversight and control of my future career. So, if we can get the leadership right and the leadership focused in those areas, then that's what we need to do and that's why leadership in process safety is one of the key areas that the Safety Centre does an enormous amount of work in. We run training programs for senior executives and boards of directors to try and get it right at that level so that we can actually see it permeate down through the organization. [You can access Process Safety Leadership and Culture here.]
Traci: Trish, do you have anything to add? You know, I like to leave these open-ended ones in case I miss a point that you need to make to help us put this all together.
Trish: I think my closing thoughts on this one, that we need to sometimes remind ourselves why it is we do process safety, why it is that it's important. And because sometimes it can get lost in the everyday activities that we have to do. But at the end of it all, the reason we have to focus on process safety is because our life depends on it. And the lives of our co-workers and the impact on our communities, it depends on how well we manage process safety in our facilities. People die when we don't get this right. You know, this is not an unfortunate mishap. People don't get to go home. And that has to be the overriding thing that we need to focus on, that people get to go home at the end of their shifts. And if we can all just make a small improvement in that, then people will get to go home. Less people will die at work. And so, that to me is what I like to remind myself of occasionally. Why I'm here is actually so people don't die. It's really that simple. That's why I do what I do. That's why I try and encourage other people to do it because I don't want to see people dying, certainly not on my watch, but in general, that's not what I'm interested in. I want to see people safely and productively operate their facilities because we need companies to make products and to do things, but we also need them to do it safely.
Traci: Well, it's obvious that you always are striving to help us focus our safety culture and to get our leadership right where process safety is concerned. So, I appreciate your time again on this topic. And unfortunate events happen all over the world and we will be here to discuss and learn from them. On behalf of Trish, I'm Traci, and this is "Process Safety with Trish & Traci."
Trish: Stay safe.
Want to be the first to know? Subscribe and listen to Process Safety With Trish & Traci on these platforms