Common Mistakes in Writing Failure Modes
As we mentor RCM facilitators, we try to help them to develop strong techniques so that as they facilitate the RCM Blitz™ process they are able to work with any RCM team to develop a solid list of failure modes that are actually occurring to your equipment or are likely to occur at some time. One of the ways to become stronger at developing good failure modes is to be able to recognize a poorly written or bad failure mode. Below is a list of what we believe are common traps that result in the listing of poor failure modes.
Writing Failure Modes at a level too high to make sound decisions
Using a centrifugal pump as an example, what would happen if I decided to make my RCM analysis go faster by writing the failure mode – The cooling water pump fails. What task should I implement to mitigate this failure mode? Dropping a step closer to actual failure mode I could also write – The cooling water pump bearing fails.
Looking at that failure mode, are we any closer to understanding why the bearing failed and as a result could we develop a sound maintenance task? We might elect to perform vibration analysis to detect the bearing is in the process of failing. The question is, do we want to use vibration analysis to inform us that someone forgot to lubricate the pump bearing?
Writing or identifying failure modes at a level that is too high to define the correct maintenance task typically results in maintaining the status quo; a run to failure maintenance strategy that is focused on making your maintainers become faster at replacing components.
Combining/Grouping Failure Modes
The second most common mistake is combining or grouping failure modes. Looking back again on the failure mode – Cooling water pump bearing fails due to lack of lubrication. What would happen if I made the decision to write the failure mode as – Cooling water pump bearing fails due to improper lubrication?
In making the decision to group the individual lubrication failure modes can I now expect the team to come up with a sound task? How many individual failure modes are now grouped into this one statement?
Improper lubrication would include the incorrect type, the incorrect amount, the incorrect frequency, as well as contamination of the lubricant. This could be five or more failure modes grouped into one statement.
Using Failure Modes Lists
While failure modes lists can be helpful and can speed up the RCM process, the lists often create more problems than they solve. The overall objective of a RCM analysis should be more than listing the known failure modes of the components that make up the system you’re analyzing. The discussion and discovery of the likely failure modes in your plant is an educational tool for your facilitators and team members. It opens discussion to build an understanding of the failures that have, and could occur, at your plant. In visiting manufacturing plants around the world, those with the worst equipment reliability have the highest levels of reactive maintenance. As we begin to perform RCM analyses at these sites, we see one glaring problem; their reactive maintenance culture has morphed their maintenance personnel into component replacers instead of equipment maintainers. As a result, when asked to begin failure mode identification they rarely know or understand the specific causes of failures.
The problem with failure modes lists:
- Most lists are not complete, I thought this until I saw one that had 168 failure modes for a ball valve. So while most are not complete, I have yet to find one that will address YOUR failure modes.
- They dumb down the learning process of failure mode identification.
- They slow the learning/certification process for RCM facilitators.
- They often result in discussion/consideration of failure modes that are highly unlikely at your plant.
- The failure effects will never be correct because they are not addressing your process
- The tasks connected to these lists are almost always overdone and unrealistic
Overuse of the “Black Box” failure mode
The term “Black Box” in RCM comes from airline industry flight data recorders. In the world of RCM, we use the term to describe the chunking of several failure modes into a two-part failure mode, “The component fails.”
The excuse to black box typically comes into play for two reasons:
- We don’t know how the component works.
- Regardless of the cause, the failure effect is identical for all its failure modes.
I offer the sound advice that if we don’t know how it works, now is a good time to learn. The second excuse is normally used as a team pushes to complete a given RCM analysis. As the week goes on, there is a tendency to rush the process. At this point, I urge teams to list and discuss the failure modes and tasks as we often miss significant improvement opportunities when pressed to complete.
Muhammad Mulyadi says
Hey this article is great. I am in the midst of analyzing some failures – where and electrical system failed because the wire connection was loose. Upon further investigation, it was observed that the cable lug used to terminate the the connection was found broken. I was wondering, if I should classify this as a mechanical oe electrical failure mode, as the lug itself is part of an electrical system but fundamentally (and materialistically) it is mechanical. Would be great if I can be guided how I should look at this. Just starting my failure analysis, RCA journey,