One of the curses of Reliability Centered Maintenance and Root Cause Analysis is a great Facilitator makes the process look easy. So easy in fact that after watching a good facilitator lead one event we’d like to believe we could jump right up and facilitate the next one with no training or assistance. In Mentoring RCM Blitz™ facilitators during the certification process we work to perfect 2 dozen skills with the hope of turning out great facilitators. The skill we typically spend the most time working on is writing failure modes.
While there are many places that new RCM facilitators struggle when it comes to facilitating a thorough and useful RCM analysis, most errors in the process start at the failure mode level. Writing good failure modes requires an expert level of understanding of hundreds of different of components. What is the component intended to do (Function) and what are the ways that this component can functionally fail (Failure Modes).
Over the past 20 years I have tried a couple of different ways to teach how to write good failure modes. In performing hundreds of RCM Blitz™ analyses with different facilitators, and practitioners for companies around the world we have come to understand that good failure modes should be written in three parts.
- Specific Cause of Failure
As an example: Cooling water pump bearing (Part) seizes (Problem) due to lack of lubrication (Specific Cause of Failure)
The part is the location or source of where the failure mode begins. Looking at the cooling water pump listed above, a rookie facilitator might be tempted to say the cooling water pump failed. While this is true, where did the failure begin? It began when the bearing was not lubricated.
The problem portion of the three-part failure mode is the undesired condition that results from specific cause of failure. If we neglect to lubricate the cooling pump bearing it will vibrate, heat up and eventually seize. While the bearing has been failing for some time when it seizes we now have a problem.
The third part of a good failure mode is the specific cause of failure. As we write each failure mode we should recognize that the purpose of RCM is to develop a task that will be applicable and effective at mitigating the cause. If we don’t get the specific cause written at the correct level your team will never select or develop a good mitigating task. Again, with the end in mind if we miss the specific cause the outcome of your analysis will surely miss at eliminating or mitigating the failure mode.
So, what exactly is a specific cause of failure? This is where experience in Root Cause Analysis or Cause Mapping becomes extremely valuable. Failure Modes are all about understanding the relationship between cause and effect. The trick is to learn to discuss each failure mode at a level where a sound maintenance task can mitigate or eliminate the failure mode. To understand this let’s go back to the cooling water pump.
Cooling Tower Pump Fails – Some would consider this a failure mode, I would not. It only contains two pieces of a three part failure mode, the pump and at a high level, and the problem. How would one mitigate this failure? Is there a maintenance task to detect, reduce or eliminate this failure mode? Would this task be applicable and effective in detecting, eliminating or mitigating this failure mode? Being honest, this failure mode is nearly useless. The only way to deal with this failure mode is to replace the pump.
Cooling Tower Pump Bearing Fails – Again, just two parts here, there is not enough information here to make a sound task decision. Some would say that we could perform vibration analysis and detect the bearing failure. While in most cases this might be true, without knowing the specific cause we cannot be sure. In many cases, there are specific causes of failure where vibration analysis is clearly not the best task for mitigating the failure mode. As an example, I don’t want to use vibration analysis to tell me that we have not lubricated a bearing.
Cooling tower pump bearing seizes due to improper lubrication – While we have three parts here, how do I deal with this specific cause of failure? What does improper lubrication mean? There could be several specific causes buried within this one failure mode. For instance, improper lubrication could mean, too much lubrication, not enough lubrication, the incorrect type of lubrication, lubrication at the incorrect interval. It is extremely important to remember we need to have the specific cause written at a level where we know the maintenance task will be both applicable and effective in eliminating the failure mode. Each of the separate causes listed in regard to lubrication would result in a different mitigating task. Combine the causes and we now risk missing a failure mode and a task.
Remember, the failure modes we identify are the key to developing our complete maintenance strategy and most important failure modes to identify the failures that result from the context and environment in which we operate our equipment.
More information on writing good failure modes can be found in my book Reliability Centered Maintenance using RCM Blitz™.
In closing, I would love to hear your comments regarding failure modes. Did this article help you to better understand the process of identifying good failure modes? Have you ever completely missed a failure modes that resulted in the failure reoccurring several times before it was properly identified?
Doug Plucknette, Principal & World-Wide RCM Discipline Leader at Allied Reliability Group is a Reliability Engineering Consultant and Published Author of “Reliability Centered Maintenance using RCM Blitz™ and Clean, Green and Reliable. Having created the RCM Blitz™ Methodology he has been an RCM Practitioner and Trainer for over 20 years. Doug resides in Spencerport, NY and can be reached at firstname.lastname@example.org
Thiago Lima says
Dear Reliability Engineer,
This article was one of the best ones I’ve ever read about RCM process.
In addition, I would also like to comment about the property of giving us this powerful and useful tool to help other’s struggle.
It will be always a pleasure to read issues like that. Furthermore, I would like to buy this book in a near run.
Thank you very much!
My best regards,
Thiago from Brazil.
An interesting post discussing equipment reliability, failure modes and RCM process. Well, if we talk about equipment failure then yes it’s very important to figure out the failure pattern and then follow steps to avoid equipment failure, downtime and increase their reliability. We need to implement the right lubrication program and regular maintenance of equipment for max reliability. Quality filtration systems, effective lubrication plan, and best oil handling practice we need at the facility lube room! Thanks for sharing these guidelines and Valuable Book!