Most associate the traditional application of RCA, with some type of undesirable outcome, that has exceeded a set threshold or trigger. It is at this point that a formal RCA process will be set into motion. Because of its serious nature, it will likely have the attention and support of leadership. It may also have the attention of external stakeholders like OEM’s, insurance companies and regulators. As a career ‘RCA’ professional, I must say that this traditional perception unfortunately is the norm, and not the exception in my 35+ years in this space. This is unfortunate because it suppresses the methodology’s potential, and the capability of applying a more holistic approach to preventing such undesirable outcomes in the first place. In this article my intent is to provide a different perspective of how to view ‘RCA’ in a much broader and meaningful context and coupling its use with an effective Defect Elimination (DE) strategy. While most are familiar with hindsight-based RCA, we will explore the emergence of foresight approaches as well. We will also contrast these perspectives in an effort to truly understand WHY we even do RCA, and is it worthwhile?
WHEN DO WE TYPICALLY CONDUCT FORMAL ROOT CAUSE ANALYSES (RCA)?
When do we typically conduct a formal RCA? As mentioned in the abstract, typically a formal RCA is commissioned when a defined corporate trigger has been exceeded, such as:
1. OSHA Recordable Event
2. SIF (Serious Injury/Fatality)
3. Equipment Damage in Excess of $XXXXX in a Single Event
4. Production Losses in Excess of $XXXXX Over a Determined Time Period
5. Regulatory Violation
6. Hi Priority Customer Complaints
There are more, but these are the ‘usuals’. When one of the above undesirable outcomes occur, we will likely be doing a formal RCA whether we like it or not. These are the types of events that the ‘suits’ show up at and will automatically have leadership’s attention (often because there may be some liability involved).
WHAT IS THE PURPOSE OF CONDUCTING A FORMAL ROOT CAUSE ANALYSIS (RCA)?
While this is a seemingly easy question to answer on the surface, I’d bet that I would get as many answers, as people that I asked. On the surface, here is what I usually hear when I ask this question:
1. To Prevent a Future Recurrence
2. To Make the Workplace Safer
3. To Be Compliant
4. To Make the Customer Happy
These are all admirable, and all valid, but they are also all desired outcomes as a result of an RCA. Follow along with me on this logic sequence and see if I eventually make sense (come to a deeper understanding about the purpose of an effective RCA).
1. Desired outcomes will only come from implementing effective solutions.
2. Effective solutions will only come from properly identified Corrective Actions (CA).
3. CA’s will only come from properly identified and validated ‘root’ causes.
So, following this logic, the purpose of conducting a formal RCA is to uncover the true, actionable root causes. We must remember that ‘causes’ and ‘solutions’ are two different things. Properly validated root causes should be non-negotiable. Oftentimes we do not like what we find in RCAs, but that doesn’t give us the right to change the facts to what we want them to be. Only those who can look in the mirror and acknowledge (publicly) that their actions as leadership, contributed to an undesirable outcome, will be able to quickly progress their organizations. Denying the facts to protect ego/position, is a sure way to stagnate progress and run the risk of recurrence.
Solutions on the other hand, are negotiable. There can be numerous ways to correct the identified root cause(s), and within certain budgetary parameters.
This opens up another can of worms, what are ‘root causes’? The answer to this question could be a book by itself, but for now, I will just state that most RCA providers on the market today, do not like the term ‘Root Cause Analysis’. It’s misleading in that it connotes there is a single root cause, and often insinuates that failure is linear. Both paradigms are inaccurate about what the intent of ‘RCA’ is to mean. However, we are stuck with the adopted term as an industry because that is how analysts these days search for such training, consulting, and software using tools like Google.
I won’t speak for others, and there is no single, universally accepted definition for ‘RCA’ or ‘Root Cause’ so I will express what they mean to me in concept. This will provide proper context for the remainder of this article.
WHAT ARE ROOT CAUSES? THE DIFFERENCE BETWEEN ROOT CAUSES AND SHALLOW CAUSES
According to our PROACT® RCA Approach (developed in the 70’s), we identify three levels of depth for root causes. Sequentially these root levels are related via cause-and-effect relationships. This also makes them interdependent on each other and incorporates the social sciences with the physical sciences. They are summarized below:
1. Physical Root Causes: The tangible aspects of the failure. Traditionally these are the physics of failure when components fail. Physical failure is observable, we can see it (unlike human reasoning, which we cannot see).
a. Examples: mechanical fatigue of a shaft, flawed metallurgy of a component from an OEM, brittle failure of a fastener, poor quality of product produced
2. Human Root Causes: The act of making a decision, a choice to do or not do something. Errors of omission or commission.
a. Examples: decision to increase process temperature/pressure/flow, decision to purchase from the least expensive bidder, decision to skip a PM to do emergency work, misaligned a pump at installation, decision to take a short cut to increase production
3. Latent or Systemic Root Causes: The flawed management/organizational systems that decision-makers’ relied on to make an appropriate decision at the time.
a. Examples: followed an obsolete startup procedure, misplaced purchasing incentives resulting in buying of less than adequate components, lack of updating P&ID’s, management ignores ‘short cuts’ and only disciplines for them when something goes wrong
The links in this paper will take you into more depth on these specific, involved issues, but for this paper, we are exploring a macro view, versus a micro view (in the weeds).
What if we didn’t conduct a true root cause analysis, and we actually conducted a shallow cause analysis? Figure 1 illustrates this concept based on our discussion about ‘root’ levels. The earlier we stop drilling down towards latency, the shallower the analysis results will be.
As a side note to Figure 1:
We tend to judge others by the outcomes of their decisions…yet we often preferred to be judged ourselves, by the intent of our decisions. Make no bones about it, true RCA is all about uncovering the intent (and reasoning) of the decision at play! It does not focus on who made the poor decision that resulted in a bad outcome.
Let’s make up a quick and easy case to express some key points along these lines.
Quick Description: A paper mill experienced 6 hours of unexpected downtime due to a critical fan failure. During the analysis, it was determined by metallurgical review, there was a fatigue failure of a critical bearing.
1. If we conclude the analysis with a fatigue failure of the bearing (physical root) due to high vibration, and just replace it, have we done an effective ‘root’ cause analysis? No. We still have not determined why the bearing fatigued due to excessive vibration.
2. If we continue drilling down our analysis, we find that the mechanic who did the recent install, made the decision (human root) to align the equipment in a certain manner (which was inappropriate). If we discipline this mechanic for the misalignment, have we done an effective ‘root’ cause analysis? No. We still have not determined why the mechanic did not align the equipment properly.
3. Now let’s say we delved into further understanding why the mechanic was aligning the way he was, and found out that we 1) never trained him as he assumed the duties after a retirement took place, 2) he was using outdated tools to conduct the alignment and 3) how come we didn’t know someone was in a position in which they were not qualified? Now, have we done an effective ‘root’ cause analysis? We’re a lot closer because of the depth we went to understand how our systems, impact our people’s everyday decision-making. Figure 2 graphically shows the cause-and-effect logic associated with the mock case we just discussed (screen shots courtesy of www.easyrca.com). FYI – In some of the nodes I use the acronym ‘LTA’. This is an abbreviation for ‘Less Than Adequate’.
In scenario #1, if we replaced the bearing, would the problem have gone away? NO
In scenario #2, if we disciplined the mechanic for the misalignment, would the problem have gone away? NO
In scenario #3, had we trained the mechanic, provided him the proper tools, and ensured he was qualified to align in the future, would the problem have likely been gone away? YES
This was just a quick mental exercise to demonstrate the purpose of ‘breadth and depth’ in an RCA versus a shallow cause analysis. Figure 3 graphically expresses this ‘root’ concept.
Above the surface is what we can see (observable). The true roots are beneath the surface. So, if we simply remove the weed from the surface, and the roots remain, the weeds will resurface. This is often what we see with near misses. Since we didn’t suffer a bad outcome, often we don’t feel we need to do a formal RCA. However, the roots that lead to that near miss (or good catch) are still in the system and will likely rise again in the future.
THE JOURNEY TO IDENTIFY LATENT ROOT CAUSES: HINDSIGHT VERSUS LEVERAGING FORESIGHT
Up until this point, we have logically walked through how we can properly use an effective RCA tool, to uncover flawed organizational systems. To me, this is the purpose of the RCA, to identify flawed systems that influence our peoples’ decision-making behaviors. If we provide them better systems, support, and guidance, they will make better decisions.
On the path we took to get to this point, we had to have an undesirable event hit a rather serious trigger, for us to act. But when we look at this from 30k feet, by the time this trigger is hit, isn’t too late? We’ve already suffered the bad consequences. We will be using RCA only as a reconstruction tool (hindsight) to track down the latent root causes.
Why should we have to wait for such a trigger to find these latent root causes (flawed management/organizational systems)?
What if we designed our Defect Elimination (DE) efforts to supplement our RCA initiatives, and form an overall Incident Management System (see Figure 4)? What if we unleashed the creativity of our workforce by proactively engaging them in learning sessions where they identify these flawed systems. I’ve never been in an organization where those closest to the work, couldn’t identify those flawed systems. They are also most anxious to let management know in the hopes that…this time, they will be listened too and not just heard (appeased). This is just a simplistic, conceptual expression that would certainly need to be refined for each implementation.
In my experience, what I usually hear about during these learning sessions are flawed systems such as:
1. Training related issues
2. Communication related issues (vertically and horizontally/internal and external)
3. Obsolete/non-existent documentation
4. Purchasing changes in vendors/suppliers without knowledge of operations
5. Condoning of short cuts (when they work) and discipline for the same short cuts (when they don’t work) – hypocrisy!
6. Use of outdated technologies
7. Lack of skilled labor (hiring warm bodies)
I’d say this covers 80% of the major flaws in system’s I’ve come across. If we look at this list, and we don’t seek them out proactively, then we can expect to come across them reactively…via our RCAs. An effective DE process will efficiently, effectively, and economically minimize the need to conduct formal RCAs. The ROI’s will literally be unbelievable (to the point you will have to water them down for people to believe you).
HOW DO WE PROACTIVELY ENGAGE OUR WORKFORCE?
In practice, it is not hard to identify these latent root causes proactively because if we simply ask our front lines, they already know where these hidden treasures are located, and its often in plain sight. They have to deal with these barriers to success, every day. We just must earn their trust, listen intently (not just hear) and absolutely provide a feedback loop from this learning process, where they receive updates about how their suggestions are actually being implemented (complete with timelines and assigned responsibilities).
Engaging our workforce to prevent our failures is the greatest defense against experiencing them! Contrary to popular belief, its not typically that expensive at all. Rarely are capital monies involved, because fixing ‘systems’ is not capital intensive, but it does require some human resources and a bit of patience and sweat equity.
While I will lay out the basics of such an engagement process, a tried and true Defect Elimination model can be found at The Manufacturing Game, and by one of their founder’s Michelle Ledet Henley. They recognized this engagement opportunity in the early 80’s (while at DuPont) and created a fun, educational and validated board game to cultivate such engagement. Once you play ‘the game’, you will never forget what you learned from it.
I also very much like The Manufacturing Game’s (TMG) definition of ‘Defect’ they’ve been using for 25 years, which I think is appropriate to share here:
“Anything that erodes value, reduces production, compromises health, safety or environmental performance or creates waste”.
Let’s first look at some key principles of developing and implementing an effective DE system. It must:
1. be an honest and genuine effort by leadership to ask for help from their workforce, to solve problems in the best interest of the entire organization. It cannot be viewed as another ‘flavor of the month’ fad that management has come up with to appease calls for involvement.
2. be easy to execute (not an administrative burden, on an already overburdened organization)
3. require giving the latitude/time/trust to those making the suggestions, to implement their ideas, themselves
4. absolutely involve a direct feedback loop to those making the suggestions, about the improvements they have yielded for the organization
5. It much provide celebration and recognition for the individuals/teams and their realized successes
These are just key principles, which leaves a lot of leeway for various ways to execute. I will summarize what this means to me.
1. STEP 1 – ATTENDEE SELECTION: Personally, I prefer to seek volunteers, rather than force individuals into such activities. We seek self-motivated individuals who want to make a difference and are less interested in the politics involved. Typically, these volunteers will also be credible influencers from the field, which will help make the effort legitimate once they start getting recognition for their successes. Ideally, we’d like to see cross-functional teams of 3-5 members.
2. STEP 2 – IDENTIFY DEFECTS: Ask the teams to ‘Identify defects that prevent them from doing the best job they can do?’ This will be easy for them!!
3. STEP 3 – PRIORITIZE DEFECTS: Ask the teams to prioritize their identified defects based on a simple Impact/Effort Priority Matrix. See an example of such a simplistic prioritization too in Figure 5 below. Start with the low hanging fruit, the high impact, low effort opportunities.
4. STEP 4 – ANALYZE DEFECTS: Using simple, available analysis tools like 5-Whys, Fishbone Diagrams and Logic Trees (or a hybrid as see in Figure 6), with post-its and kraft paper is all that is required. This DE approach does not seek the rigor of a formal, evidence-based, triggered RCA approach for serious events. It is amazing the creativity and innovation of our workforce if we just genuinely engage and recognize their experience and expertise.
5. STEP 5 – DEVELOP, SUBMIT & EXECUTE CORRECTIVE ACTIONS: When developing corrective actions, the submitter must either have the control to implement them ASAP, or at a minimum, have influence to get them done (with the full support of their leadership) . As these learning sessions are conducted, all the ideas will be submitted, at a minimum, using a simple paper-based form as shown in Figure 7. Again, this is a simple generic form that would be customized to a specific site.
6. STEP 5 – ORGANICALLY GROW A CORRECTIVE ACTION KNOWLEDGE BASE: As mentioned earlier, our macro view for this effort is to aggregate all of this creativity and innovation, and store it in a shared knowledge base for all to learn from. Figure 8 exhibits a basic, practical form for collecting this information from our DE efforts. Again, this generic form would need to be modified for specific purposes. This will aid in providing support as well as tracking for bottom-line effectiveness. Certainly as the effort gains traction, we would seek to automate this process using practical technologies on the market. With the ROI’s generated from such an effort, this should be a self-funding effort.
7. STEP 6 – EXPLOIT SUCCESSES: Now we want to leverage our successes by sharing them with our entire facility, as well as any sister facilities we may have within our corporation. We need to make this knowledge base accessible to those that can benefit from it. This can instantly replicate successes from one area to another, simply because we knew such a solution existed. This will dramatically reduce the amount of potential RCA re-work, when we try to solve the same issue, over-and-over again…simply because we didn’t know someone else already solved it. This is a legitimate effort to institutionalize knowledge. Think about this as your baby-boomers exit with all that problem-solving knowledge in their heads. This knowledge needs to be collected and transferred before their retirements.
DE AND RCA: A SOLID MARRIAGE
I hope I have positioned both Root Cause Analysis (RCA) and Defect Elimination (DE) as two very valuable tools that complement each other, and do not contradict each other (or replace each other).
In its traditional application, RCA will always be available to thoroughly analyze triggered events. The more sophisticated industries will use the same tool for non-triggered events such as:
1. high frequency/low impact chronic failures (resulting in high annual costs),
2. near misses with potential high severities and
3. unacceptable risks identified from credible risk assessments like FMEA’s and FMECA’s.
Properly executed RCA’s will certainly identify flawed organizational systems. However, it will likely be via reactive means. This is where we can seek the assistance of a simplistic DE system, that can help us proactively identify the same flawed systems without suffering the consequences of a bad outcome.
In the graphical expression of Figure 9, we see that a traditional, triggered RCA will follow very disciplined, evidence-based steps. Cause-and-Effect relationships will be explored from level to level. Those hypotheses proving to be true, will be drilled down deeper, until they uncover the physical, human, and latent root causes.
As we have discussed, the Latent (Systemic) root causes are where the gold is. These are the factors that collectively triggered the undesirable outcome to occur. The point here is that we had to go through a lot of effort and pain to retroactively identify these flawed latent systems.
If we embrace the DE concept, and seek to identify these flawed systems proactively, we don’t need to suffer the consequences to act. Using the DE concept, in a non-failed state, we are using the brainpower of our workforce to identify these system flaws. This is a much more efficient and economical approach. However, the greatest benefit will be culturally, as we build trust with our workforce and demonstrate that they truly possess the power when it comes to realizing the corporate vision. The front lines know exactly where those flawed management systems are because they have to workaround them on a daily basis!!
Conceptually the better we are at DE, the less of a need we will have for reactive RCA’s due to triggered events. This is because we are seeking out these flawed systems before they contribute to a bad outcome.
The proper balance of these two tools can add extreme value to any holistic Reliability Engineering system, as well as engage the creativity of the workforce in solving the company’s problems!
As I always do, I like to end with a reminder of our common paradigm to defeat:
“We NEVER seem to have the time and budget to do things right, but we ALWAYS seem to have the time and budget to do them again!”
About the author: Bob Latino is an internationally recognized author, trainer, software developer, lecturer, and practitioner of best practices in the field of Reliability Engineering and specifically in Root Cause Analysis & Investigation Management.
Bob has been facilitating RCA & FMEA analyses with his clientele around the world for over 35 years and has taught over 10,000 students in the PROACT® RCA Methodology. Mr. Latino is co-author of numerous books, seminars and workshops on FMEA, RCA & Reliability, as well as co-designer of the PROACT® Investigation Management System.
Recent industrial books by Bob Latino
Lubrication Degradation: Getting into the Root Causes. (Mathura, Sanya. Latino, Robert. December 2021. c. 147 pp., ISBN 978-1-032-17157-9, Taylor & Francis (co-author)
Root Cause Analysis: Improving Performance for Bottom Line Results (5th Ed., June 2019, c. 331 pp., ISBN: 13:978-1-138-33245-4, Taylor & Francis (co-author)
The PROACT Quick Reference Guide. (September 2020. c. 92 pp., ISBN-13: 978-0367517380, Taylor & Francis)