There has been an ongoing debate for decades as to whether or not the use of pre-existing logic for conducting Root Cause Analyses helps or hinders the analysis results. Does the use of such pre-existing logic expand the thinking of the team members or does it lead the team to pre-determined conclusions and away from other conclusions not considered in the pre-existing logic? We will explore the fine line between these opposing views and see if there is a middle ground for consensus.
This article’s intent is not to debate the definition of “root cause analysis” because if it did, it would go on indefinitely! For those readers who have participated in such discussions on various online RCA forums, you know what I mean. However, I think we can most all agree that no matter how you define RCA, that undesirable outcomes are the result of multiple cause-and-effect relationships that line up over time. No matter what tool you use to express these cause-and-effect relationships (i.e. – logic tree, fault tree, why tree, causal factors tree, factor tree, fishbone diagram, etc.), we nonetheless can agree that these relationships must exist for the undesirable outcome to surface.
With a lack of a standardized definition of Root Cause Analysis comes the ambiguity of terms related to RCA itself. What is a Root Cause? Again, the answer to this question suffers the same fate in the public domain as Root Cause Analysis and is not the focal point of this article.
Let us begin with the concept that flawed systems oftentimes adversely impact human decision-making. Flawed systems are the information systems in which we use to help us make better decisions. Such systems include but are not limited to our training systems, purchasing practices, procedures, policies, etc. For example, I may have decided to use too much lubricant for a pump in my area causing it to fail prematurely. The basis of my decision is that I am an operator who has recently been given the additional task of lubricating equipment that I operate. This additional responsibility comes as budgets are cut and when mechanics retire, the company is not replacing them. These responsibilities are shifted to operations without training operators in proper lubrication practices.
In this scenario, we have the following:
The point we are trying to get across is not the comprehensiveness of the example at each level, but a simple understanding of how our systems affect our decision-making and consequently cause a physical (observable) effect to emerge. It does not matter if the RCA approach you use labels causes differently (i.e. – approximate causes, near root causes, as long as they represent the cause-and-effect relationships in the manner represented above.
As far as RCA goes, the above was a summary of methodology to gain a consensus on how failure occurs in a sequential series of parallel paths resulting in a common undesirable outcome (the Event). Now let’s move on to the role of templates in RCA.
Whereas before we were discussing methodology, now we will move to focus on content. An RCA process is essentially a framework of cause-and-effect relationships built around a set of methodology rules. This framework however has no content in the beginning. The burden falls upon the lead analysts and their team to develop hypotheses and validate whether they are true or not. This knowledge from the team members will be extracted based on their respective experience in the field.
The greatest learning that will come from any successful RCA effort will be the learning that takes place during team meetings. By having to continually ask how something could happen, we must explore in our own minds as to how it could happen from a cause-and-effect standpoint. For instance, if it is found that a bearing has failed, the ensuing question would be, “How could a bearing fail?” Now most maintenance people and engineers have been around bearings for their entire careers. They know them inside and out and replace them daily. This seems like a very easy question and your likely answers would include things like:
- Improper installation
- Wrong bearing
- Defective bearing
- Over lubricated
- Under lubricated
- Wrong lubricant
There are many more potential paths to failure but you get the idea. When dealing with RCA we teach people to view the vertical cause-and-effect tree as a timeline. If we know a bearing to have failed can we imagine and visualize this in our minds and move back a small increment in time to see what could have just happened to cause that bearing to fail? Most would not disagree that when looking at it this way, there are really only four (4) plausible ways in which a bearing can fail:
Any of the other possibilities listed above would eventually cause one or more of the above failure patterns to surface on the bearing. So if it was proven that “fatigue” was the culprit in this case and there was no evidence of the other failure patterns, we would mark the others as not true and continue following “fatigue” down the tree. The next natural question would then be, “How could we have had fatigue of the bearing?” And the questioning goes on the same.
By extracting this knowledge from the team members, we are constructing a knowledge or experience tree. The team members are learning because their minds are being exercised as to which hypothesis is the cause and which is the effect. It is not always as simple as we would like it to be but having to think through it is definitely the greatest learning opportunity.
In the end, when the analysis is complete and recommendations are implemented, we will eventually be able to measure the effectiveness of our analysis by its impact on the bottom line. Something had to get better like decreased injuries, increased production, decreased cost or frequency, etc.
If we have a successful RCA now based on the knowledge and experience of our team members, how can we leverage that logic for the benefit of the corporation?
Corporate Memory/Leveraging Knowledge and Experience
Most of us can remember the “re-engineering” era of the late 80’s and early 90’s. Unfortunately re-engineering became associated with census reduction and efficiency measures. This certainly was the era of the golden handshake where people were incentivized to accept early retirement packages so that the census could be reduced. This seems logical in concept but was very poorly applied in application. Corporations started to indiscriminately offer these retirement packages hoping that a certain number of people would take them. I remember one Fortune 100 company at the time that estimated 6,000 people would take the package and 12,000 actually did! Imagine the chaos this caused in that company as many of the new retirees were now hired back as contractors at greater rates.
Who tended to take these early retirement packages? Those that knew they could get a healthy severance and another job quickly are the ones that bailed…in other words those with the most experience! When you have a mass exodus of talent in a corporation what danger does that pose? The danger posed is the loss of “corporate memory”. The knowledge and experience of the best problem solvers just left the corporation and took their internal laptops (their brains) with them. Therefore all of those people that knew how to solve the specific problems of their workplace are gone and the problems are now the responsibility of those left behind. This scenario was and is real today and represents a significant safety risk to the corporation and also millions of dollars in potential production losses and unnecessary costs.
How can we combat against this real world scenario? We can do so by capturing the successful logic of expert problem solvers using our RCA methodologies and tools described earlier. This is rarely done and when it is attempted, the manner in which the logic is collected is inconsistent with the methodology and tools being applied.
Reliability Center, Inc. (RCI) has been developing such logic over the past 2 decades. The end result is a series of successful logic trees which we will now call PROACT® Logic Tree Knowledge Management Templates. These hundreds of templates have been developed using the logic of expert analysis in the field. They represent the actual logic used to solve equipment, process and human related failures over the past two decades.
These templates are structured in such a fashion that they can only be used with the search and navigational tools used in our PROACT® RCA Software and our partner’s product, PROACT® for GE APM (formerly Meridium).
Imagine being in an RCA team meeting and getting stuck on a hypothesis where you have exhausted the team’s experience and seek to see what others might have suggested when they faced the exact situation in a prior analysis. Imagine doing this real-time and using key words to call upon the logic used by others at that point in the logic tree. Think of the efficiencies that this brings to the table to expedite the analysis while actually making it more comprehensive and accurate.
The Potential Pitfalls of Using Logic Templates
As stated earlier, the greatest learning that can occur from RCA is from the questioning process that goes on during a team meeting. The constant striving or effort to understand the order in which factors occur, cause-and-effect, is the critical learning point in the analysis. Templates, when not properly used, can reduce the effectiveness of this learning opportunity.
The key to optimizing the value of the templates is to use them as supplemental knowledge to that of the team members. If the templates are used as the primary knowledge to the analysis, then there is a potential for the learning process to be expensed. I call this potential situation “doing RCA like paint-by-the-numbers”. This is when the templates are used as a pick list of options and the intent is to finish “a” logic tree quickly that on the surface will impress the people we present it to. It does not mean it is right, it just looks good.
Most of the time analysts would be tempted to use this pick list approach when they are under time pressure (and aren’t most of us under time pressure?…hence the real temptation). Anytime we are under time pressure to do anything, we will seek a way to take shortcuts. In RCA, those shortcuts come in the form of qualification, verification and validation of our hypotheses. If our goal is to complete an analysis quickly, we will rush to construct a logic tree and chances are not properly prove that our hypotheses are correct using satisfactory verification methods. When faced with either having a metallurgist look at a failed part or taking the opinion of a mechanic who has not been trained in metallurgy, we may opt to go with the path of least resistance and take hearsay over science to get the tree done!
No one in RCA can regulate the manner in which their RCA methodology will be applied, the best we can do is recommend proper practices for success. In the end, the responsibility of doing what is right falls on the lead analysts and their teams.
This is how RCI believes previous knowledge and experience should properly be used within an RCA process.
Templates are No Panacea
When treating templates as supplemental knowledge to an investigation we should always be cognizant that all of the possibilities will never be included in whatever listing we produce. What is listed is just past experience, what people have encountered before in similar situations. This does not mean that there are not other possibilities that exist. We all come from unique working environments with unique variables at play (i.e. – processes, procedures, regulatory environments, cultures, etc.) Templates should NOT be viewed as all inclusive and we should continually press the boundaries of our team’s experience for looking at unique possibilities that could have occurred, always building upon our template database and creating more comprehensive templates as a result.
Types of Templates – Explanatory versus Exploratory Trees
Templates can come in two forms, explanatory trees and exploratory trees. We will briefly describe and discuss both.
Explanatory Tree Templates – these are templates that are based on actual past analyses (case studies) where the team identifies only what was found to be true.
For a simplistic example let’s say that we have a chronic pump failure. An RCA team has been put together to analyze this pump failure. They collect their data, construct their logic tree and prove their hypotheses accordingly. They include the following logic tree as part of their final report to explain their findings. Only the causal factors specific to this failure event are included in the logic tree.
Exploratory Templates – Exploratory templates are used when an RCA team is meeting and they want to look not only at what was found in a previous analysis (explanatory tree) but also what was explored. This would include the logic found to be true in the explanatory tree plus what was explored and found NOT to be true. This is essentially the difference in answering the question ‘Why?’ versus “How Could?”
When asking “Why?” something has occurred, it connotes we want a single answer based on someone’s opinion. The 5-Whys approach is a good example of this as the resultant logic is linear logic. This would be fine if familiar always happened linearly. However, we all know that failure most often occurs when parallel paths of failure occur at the same time and couple together to cause the undesirable outcome.
When asking “How Could” something have occurred, this forces the RCA team to consider all of the possibilities instead of only the obvious. The advantage here is that something that was found “not true” on a previous analysis may prove to “be true” in our current analysis. Therefore we are exploring all possibilities instead of viewing only limited ones.
Compare the explanatory tree in Figure 1.4 with an exploratory tree of the same event in Figure 1.5. Essentially an explanatory tree is embedded within an exploratory tree (see path-to-failure in red).
As you can tell in the exploratory tree of the same event, all the hypotheses with an “X” were hypotheses that were indeed explored, but with proper evidence were proven to be “not true”.
So in a nutshell, an explanatory tree only shows what was true and an exploratory tree shows everything that was explored.
Logical Hand Offs When Using Templates
Life would be nice if we could explain everything with one piece of paper! However, as we all know, life tends to be complex and viewed through multiple prisms. Therefore it is not prudent to think we can have a single logic tree that can explain away all of the failures we face. Can you imagine trying to navigate a single logic tree with thousands of hypotheses? The visual itself would be an immediate deterrent to moving forward.
The larger the logic tree the harder it is to digest from a thought processing standpoint. As a result, much thought has gone into developing our templates into “manageable chunks”. These “chunks” can be thought of as branches if you will since we are using the tree analogy.
Instead of trying to have one tree that includes all the possibilities of how bearings, gears, fans, circuit boards, motors, seals, etc. can fail, it would only make sense for each to have its own templates. Since there are hundreds of ways these components can fail, it would make even more logical sense that each of these sub-failure modes have their own templates as well. This is the reason that there may be multiple templates for equipment that has multiple modes of failure.
If you will remember our earlier discussion about cause types, we mentioned the terms Physical, Human and Latent root causes. We discussed that this is the sequential pattern in which failure occurs. When developing templates and keeping the “manageable chunks” concept in mind, we have to dissect the tree vertically as well.
Most templates are very unique when it comes to understanding how different failure modes can occur physically. For this reason, most of the Electrical and Mechanical templates in the PROACT® Logic Tree Knowledge Management series describe the physical mechanisms of failure that are possible to cause those events to occur.
The intent is that at the end of these templates, by drilling deeper, we will start to explore potential errors in judgment or decision making. We defined these earlier as Human Roots.
This is a key “hand off” point in the templates because we have moved from the physical world to the human aspects of the failure. At this point, the Human templates come into play and they provide a series of possibilities about why people make the decisions they do that result in physical failure. Remember, the Human Roots are actual decision points. The Latent Roots are the reasoning for the decisions and are embedded in organizational systems.
Expanding on the sample template shown in Figure 1.6 [Pipe Thinning], you can see that one of the logic legs ends in ‘Inadequate Material Selection” (see Figure 1.7). This is a decision point because someone made a decision to select a certain material for a certain reason. Why did they do that?
The way the templates were designed to be used and navigated allows the analyst to first use key words to get them down to Physical Roots. Then when they get to decision points (Human Roots) they will then be tapping off of the Human Templates which will explore the various reasons that people could be influenced to make an improper decision at the time that they make it.
If we were to do a key word search on “procedure” or “procedure non-compliance” we may yield some of the following results shown in Figure 1.8. We are exploring some of the reasons that procedures are not complied with and will depend on evidence to determine whether these possibilities are true or not.
It is critical to the success of using past experience and knowledge that the information easily be put at the fingertips of those that can use it. We have designed our navigation tools to locate the desired logic in the simplest manner possible.