Troubleshooting is tedious and time-consuming. Fortunately, it is a skill anyone can learn and enhance by understanding some simple basic principles. Here, we will focus on those most useful for a pilot plant.
Haste always makes a troubleshooter less effective. When a problem arises, people get agitated; so spending some time to assess the situation is far from easy. Operators want you to start doing something immediately. A more effective approach is to take a moment before you touch anything to carefully examine the system. Note the position of all the key valves. Look at the pressures and temperatures. Check what is and is not operating. Often, this will reveal the problem very quickly and clearly. I once solved a problem defying the combined efforts of a much more senior group by noticing a feed valve had been inadvertently closed.
It’s often necessary to ask the operator to give you a moment to think. An alternative is to patiently wait for the operator to explain what’s happening, partially listening while also pondering the problem. One way I have found to address an operator’s concern and still function effectively is to think out loud. This shows you are contemplating the issue while keeping the person involved. However, it does not always work because some operators insist on arguing or commenting on each point or trying to get you to stop and move in the direction they believe best. A polite but pointed comment that you are not as familiar with the problem as the operator and need a few moments to come up to speed often helps. Sometimes you might have to resort to declaring: “Could you please stop and let me think for a minute.”
Write down the existing conditions and equipment status, including the positions of all key valves, before you do anything. Jot down notes as you make changes and ensure these notes are clear and legible enough to understand what you did, when you did it, what abbreviations mean, etc. This may sound like wasteful added work but, after doing five or six things, it is very easy to be uncertain if, e.g., the last test was with the valve on manual or automatic, nitrogen was or was not flowing, the initial pressure on the tank was 10 psig, et al. I can’t tell you how many times I have had to use this information to unravel a confused operator (or myself).
Don’t blindly accept everything the operator or person discovering the problem says. People will leave out key items because they think they are unrelated, are embarrassed to tell you something they did, are reluctant to admit they have no idea what the pressure gauge was reading or if the feed valve was on, or simply don’t remember correctly, e.g., thinking they put a valve on manual when, in fact, they didn’t. Occasionally, even in the best organizations, operators may sense they created the problem and, thus, won’t give you all the background for fear of making their culpability apparent. After wasting an hour finding out the operator simply forgot to set something properly or did something in the wrong sequence, it is very hard not to let some annoyance show. However, I suggest pointing out the error in a way that doesn’t blame the operator: “It’s like the cooling pump wasn’t on when you started. Is there any chance that might have happened?” A reply of something vague like “I guess that is possible” often allows saving face and solving the problem.
Be suspicious of the person who has “solved” the problem and just wants you to fix something. I’ve wasted much too much time getting a pump changed out, a controller pulled for repair, and a regulator replaced only to find the pump wasn’t working because of an unsatisfied interlock, the controller because of an incorrectly set pressure balance, and the regulator because the gas cylinder was empty. I often try to address these “solutions” by discussing how the solver is positive this is the problem. While often not received well, good operators rarely mind explaining their thought processes while weaker operators frequently can be helped to understand that, perhaps, they are not 100% certain. In either case, this allows a more-grounded discussion to start; the time spent usually is well worth the time saved on an incorrect solution. Similarity to a past problem can badly mislead even experienced operators. They may not recognize the current erratic flow (±50%) differs a lot from the previous erratic flow (±10%) caused by a pump seal leak.
Train yourself to start from the beginning or end of a system and work your way to the other end. This often seems counterintuitive because most people want to start in the middle. They feel they easily can identify if the issue is behind or ahead of their starting point; sadly, they are wrong. Starting at one end or the other of the system avoids false moves and, ultimately, saves time. Is the breaker on or does the equipment have power? Either is an unambiguous starting point for further checking. That a particular component is not energized may be much less so. Moreover, complex systems often have numerous paths, making the ability to figure out which direction to go suspect at best.
Never assume the most complex or troublesome component is the culprit. We all tend to think the flow has stopped because that finicky valve has hung up again or that difficult-to-access filter clogged once more. That may be the case but the cause just as likely is something entirely different like a closed feed valve. Problem-prone components exert a magnetic pull on troubleshooting efforts that often hinders a more logical and, ultimately, faster approach, as I personally can attest. Learn to avoid the pull.
Recognize that everyone has a “hunch” where the problem may reside. While these sometimes are correct based on past experience or simply a good intuitive sense, I have seen hours wasted checking an issue that a quick analysis would show could not explain the problem. Making matters worse, if you don’t trust a component, you tend to keep circling back to it, often needlessly. This is where the next step helps.
Develop A Plan
Do this before you do anything else. Too often, troubleshooters rush to perform tests that won’t prove anything. Taking the time to draw a simple flow chart to show where the results of each step will lead is invaluable and well worth the effort to develop. It frequently will make clear that a certain path will reveal very little or that you must do an additional step first to ensure you can reach a valid conclusion later. Sometimes you get lost in the process and forget what the step can or cannot show. Talking through the steps with a knowledgeable individual often helps you to avoid faulty logic or skipping over another, more viable path.
For particularly troublesome problems, review with someone what you have seen and done. That person might ask you an eye-opening question or challenge an unfounded assumption. If no one is available, writing down a summary sometimes can work. Many times, the effort to organize your thoughts and explain the problem will help you solve it or at least identify the next steps.
When you have done everything you can think of and nothing has fixed the problem, resist the urge to do it all over again. Instead, take a step back and try to identify what you have not looked at. While it is possible you incorrectly checked something, you are more likely just not focusing on the right area. I often have found the problem by simply asking: “What have we not looked at so far?” The key you lost always is in the last place you look.
Doubt everything you are told no matter how much you trust and respect the person. All too often the tank is empty, the cylinder not connected, the power not on, or there’s some other incredibly obvious issue that someone failed to note. I have stood near too many valves I swear I opened and too many switches I swear I closed to not recognize how easy it is to get confused or fail to scrutinize something closely enough.
Look carefully at the system and pay attention to any nagging points. Why do you keep staring at that pump? Why is your attention always drawn to that feed system? Often your subconscious is noting something is not right and, if you look at it a bit longer, you will realize the issue.
Never do more than one thing at a time. Keep a close eye on everyone involved so that you are not starting something while some else has stopped it. Having someone double-check all five valves are in the right position or double-checking that you correctly traced that feed line to its source often saves incredible amounts of time and effort.
Make sure you understand the status of everything involved before the problem started to develop. Once, after pushing an operator to remember if anything was different right before a heater stopped working, the person finally remembered that an electrician had borrowed a ladder a few minutes before the problem arose; it turned out the electrician had locked out the wrong circuit by mistake, cutting all the power to our panel.
Conversely, be willing to let go of something that you cannot logically link to the problem. The lights might have flickered in the control room just before the problem started but, if the distributed control system is working properly, that may well have nothing to do with the problem.
Always remember the proposed issue cannot violate basic physics and chemistry. Closing a valve can’t increase the flow. Of course, you must be very diligent in making sure you have all the right information. If the valve was reverse acting, then perhaps you are opening, not closing it.
Get Proper Documentation
Never trust any drawings without verifying their correctness. This isn’t the first thing I do every time — however, I always try to confirm how the equipment is piped and wired. It is too easy for a drawing to be wrong. Good management-of-change systems have reduced but far from eliminated this problem. Some operators resist this step, loudly arguing they know their units; cajole them into letting you check anyway. My favorite tactic is to agree with them but note I always would have a lingering doubt until I verified the installation.
Always gather all available documentation on intermittent problems. Insist that people write down observable facts at the time the problem arises, not afterwards when mistakes are too easy to make. Stress to operators that it is better to admit they forgot to note a value output than to just put in a “normal” value. Make sure the operators know to stick to verifiable facts, not opinions. “The seal started leaking sometime just before the problem” is valid if the drip pan always is dry and the operator saw it a few hours before. It is not so valid if the drip pan usually contains liquid and the operator’s opinion is that the amount of liquid was more than usual. I once spent a week trying to find why a 30-psig rupture disk kept bursting when the system never had more than 5 psig on it. Only after reviewing the documentation did we discover the disk burst whenever someone turned on a gas chromatograph. As ridiculous as that seemed, it allowed us to find the incorrect line that fed 60-psig helium into our system.
Always ask the people involved about anything odd they notice, particularly the recurrence of anything when the problem appears. I finally pinned down why a gas monitor intermittently shut down a unit despite there being no releases when an operator casually noted it always seemed to happen when an operator was cleaning another reactor. (The monitor was much more sensitive to the cleaning solution.)
When you appear to have fixed the problem, try to find a way to prove you are right. If you can show the problem appears when a regulator is set too low but disappears when it is set properly, you usually can be sure you found the true culprit. However, often ancillary things you do during troubleshooting fix the problem and you don’t realize it. I once swapped out a metering pump twice before grasping the issue was vapor build-up not the pump. Each time we changed the pump, we carefully bled the lines, fixing the problem until vapor built up again. Obviously, this isn’t practical in every situation but, if, for example, the replacement component looks fine, a fast offline test may raise suspicions that you have not found the real problem and, so, may not have fixed it.
A larger troubleshooting team is not always better. Involving numerous people in the process often simply slows it down (everyone needs to have his or her say), creates the potential for confusion (too many people doing different things at once), and rarely solves the problem faster. A better approach usually is to step aside when you appear to have hit a dead end and let someone else have a try. A fresh set of eyes or a new approach, without you breathing down the person’s neck, often can solve the problem.
Stepping away from the problem for a time sometimes also can help. I am always amazed at how many problems are solved faster the next morning than the previous evening.
Sometimes, problems fix themselves. (Actually they don’t, but you somehow fixed them without realizing it.) Sometimes, you don’t understand why doing a particular thing solves the issue. Both cases are rare and worthy of further investigation. To this day, I share such stories in the hope someone smarter than me will explain why the problem disappeared. Be careful in assuming the problem truly has been fixed or gone away in these cases. Often, such situations are hiding a very subtle or intermittent problem.
Even if you fastidiously follow all these guidelines, troubleshooting will take time and effort and, often, be incredibly challenging. However, on average, it should go a bit faster and easier.