How Much Money is Wasted on ‘Root Cause Analysis (RCA)’ and Why?

Properly conducted RCAs are time and resource consuming, so when we are not getting the expected ROI’s from our efforts, we have to consider why are we wasting so much money doing the same thing over and over again? This article will focus on the commercial and process aspects of RCA, that can take away from their effectiveness on the corporate bottom-line and impact the safety of our co-workers.

TYPICAL RCA SCENARIO

We experience an unexpected shutdown that last 6 hours. Our threshold to commission a formal RCA (i.e. – our Trigger) is 4 hours so the condition has been met. An RCA team is quickly put together amidst the chaos of the outage, and oftentimes the person most familiar with the process involved, is appointed the RCA team leader.

While under such conditions, there is an attempt to get data/evidence, oftentimes the efforts are not comprehensive as we’d like them to be. This is due to time pressures to safely secure the area and restart production, the lack of cooperation by the parties that control the data to share it at that time, time pressures to conclude the RCA and the fact there may be no requirement to provide such comprehensive validation (i.e.-evidence) of our conclusions. The RCA team meets for a week (on and off, as that is not their primary job), and that ‘week’ timeframe is being very generous. Then they prepare for their final presentation to leadership seeking approval of their recommended corrective actions.

Production is eventually started up, the RCA is presented and finalized, and corrective actions are approved for implementation.

Two weeks later, the same failure occurs again, and the plant manager is not happy.

While I made up this scenario, it is based on my three (3) decades in this RCA space, and it’s not too far from reality in my experience. Let’s look at this one scenario and see what we can glean from it.

VIEWING RCA AS RE-WORK

In our above scenario, we would end up having to do another RCA because the failure recurred. This is akin to ‘re-work’. Sometimes with RCA’s we tend not to view it that way, and its deemed just as a cost of doing business. But it’s not…IT’S RE-WORK. If it didn’t happen again, we wouldn’t be analyzing it again.

What does that re-work really cost the organization?

For the sake of example, let’s use the following Assumptions. For a reality check, just replace my Assumptions with your own numbers and see what you come up with.

A table of Assumptions made for costs of downtime hours, labor and materials. This is a legend for use in another table.

In our case let’s assume the following resources and costs were applied to conducting the RCA:

Mini-Opportunity Analysis showing how to calculate the cost of re-work when doing an RCA over again. Essentially Frequency/Yr x Impact/Occurrence = Total Annual Loss.

This does NOT include ancillary costs that would involve the time of people in the storeroom/warehouse, purchasing, expediting parts, use of external Subject Matter Experts (SME), executive time for presentations, cost of implementing RCA corrective actions, RCA training & software, customer complaints, and the time of the RCA team members to meet and conduct the RCA. Essentially the costs in the table are to respond to the failure (not solve it). These numbers above would be safe, conservative and very defensible.

TIP: This is important because when trying to make such a business case for re-work (or RCA in general to be honest), expect that people will try and discredit the integrity of your numbers. Make sure your numbers come from credible sources (like your accounting department).

So, our very simple case, for only one (1) recurrence, the re-work will cost us nearly $65,000 USD on average. Imagine if this was a chronic failure that happened 4x/year! This is our business case for making sure we do RCA properly and prevent the risk of recurrence.

WHY CHRONIC FAILURES ARE MORE COSTLY THAN SPORADIC FAILURES

Chronic failures are much easier to quantify in terms of ROIs. Think about why this is the case. If we have a sporadic/acute failure that happens once every 5 years, then logically we would have to wait 5 years to see if it happens again. In other words, we’d have to wait that long to take credit for it…that’s not happening. We likely would not even be in the same position five (5) years later.

Contrast this to chronic failures, the ones that happen so often (every shift for instance) that we don’t even record them in our tracking systems. This is often because it may take longer to enter it into the system, then it does to make the quick fix. These failures are hidden in plain sight and often absorbed into the ‘cost of doing business’ paradigm. It’s not a failure anymore, it just my turn to fix it as part of my daily routine.

These are our greatest opportunities though!! These are easier to calculate ROI because they happen so often. They are actually accommodated for in our budgets as a slush fund under something like ‘General’ or ‘Routine’. They even get a cost of living increase every year!!

Let’s take an example and consider a simple chronic event such as conveyor belts that trip in a mining operation. On their individual impact they may take 15 minutes to locate and reset. This 15-min period requires the attention of a person, which at a typical standard rate ($40/hr with benefits included) results in a cost per event of $10 (0.25 hr x $40/hr labor rate).

Because the event simply requires a person to find and reset the tripped conveyor system, generally no additional parts costs are involved. However, the 15-min delay causes a production loss upstream in the processing area, which equates to $5000/hr. Fifteen minutes now is worth $1250/occurrence (0.25 hr x $5000/hr production loss). So, each 15-min occurrence is now worth $1260 ($10 labor + $1250 lost production). Still considered a relatively low impact, right?

Now consider on this particular conveying system, we experience 40 such stoppages a week or 2080 for the year. Now we are looking at an annual impact to the bottom line of $2,620,800 ($1260/occurrence x 2080 occurrences). The line item in an Opportunity Analysis may look like this.

Line item Opportunity Analysis showing the annual costs of conveyor roller failures being over $2.5M USD per year.

This is why our chronic failures tend to be way more costly than our sporadic failures. Since on their individual occurrence they do not tend to hit an ‘RCA trigger’, there is not a requirement to analyze them. We just get good at continually fixing them…faster. Food for thought my friends!

SO WHY IS THERE A NEED FOR RCA RE-WORK IN THE FIRST PLACE (NO MATTER WHAT THE RCA IS ON)?

Above we tried to make the point of what re-work costs. Now let’s discuss why we have to redo an RCA at all. Essentially, we are doing a mini-RCA on ‘WHY RCA EFFORTS FAIL’ (more resources to come on this topic).

Isn’t it frustrating to conduct an RCA and then to have the failure happen again? In my experience, when this happens, there is often an immediate rush to blame ‘RCA’ (the entire acronym and field) as not being value-added. This is opposed to considering that perhaps the way in which we conduct RCA, may be lacking.

“This kind of binary thinking may lead to throwing out the baby with the bathwater, and wholesale rejection of traditional approaches, e.g. when adepts of the ‘new view’ reject RCA (root cause analysis), or the use of BowTies. Instead of dismissing and preaching to “stop using”, a better approach may be teaching about the limitations of approaches. Imperfect tools can be very useful and are often perfectly usable within the proper context (Hale, 2014; Townsend, 2014).”

Carsten Busch, Brave New World: Can Positive Developments in Safety Science and Practice also have Negative Sides?

There is a stigma of sorts that that surrounds the acronym ‘RCA’ and to me, it renders the term as useless. This is because there is no universally accepted definition, so therefore however anyone is solving problems, they will call their approach ‘RCA’. This can range from a 5-Why’s graphic on a bar room napkin to a comprehensive, evidence-based investigation on a serious event. Whether it’s brainstorming, troubleshooting, the 5-Whys, the Fishbone Diagram, a BowTie approach or a Causal-Factor Logic Tree, they are often treated as equals…and that simply is not an accurate comparison.

First off, each of these approaches has their place. They wouldn’t be around for so long if some of their users were not getting a benefit. So, when properly applied, each of these tools can add value. However, the key words in that sentence were ‘when properly applied’.

Tip: An analysis is only as good as the analyst! You can have the fanciest technology/tools in the world, but if you don’t know how to use them, they are rendered useless. The tools are inanimate objects. Their users need the creativity, innovation, and skill to make the tools reach their potential.

Think of artisans and true craftspeople who have very specific tools of their craft, and they can produce masterpieces. While the general population would not know how to use the tools to produce such masterpieces. I bet you can think of a hundred similar analogies of where the skill of the user, is what makes the tool produce ‘masterpieces’.

Same goes for RCA, how well we apply it will be the difference between success and failure. Of course, it is up to the analyst to know the tools in their toolbox, and which is best to apply under certain conditions.

WHAT ARE THE POTENTIAL CONTRIBUTING FACTORS/ROOT CAUSES AS TO ‘WHY RCA EFFORTS FAIL’?

Here is a basic listing of what I see in the field, that prevents true, holistic RCA’s from providing expected value:

1. RCA Process Related Issues

a. RCA Methodology Less Than Adequate (LTA)

i. Lacks comprehensiveness (tends to be linear in thinking)

ii. Lacks depth for magnitude of the event (stops at broken parts or blaming someone)

iii. Lacks flexibility to apply to many different types of undesirable outcomes (not versatile enough to work for any undesirable outcome)

iv. Lacks evidence-based capabilities (allows hearsay to fly as fact)

v. Too hard a process to follow in a practical manner (process perceived as too complex and complicated)

vi. Too many steps to follow (process perceived as too time consuming, too many steps)

vii. Too difficult to track effectiveness of the RCA (not easy enough to prove if its working or not on the bottom-line [effectiveness])

2. RCA Training LTA

a. Training quality LTA

i. Vendor instructors LTA for industry they are teaching in

ii. Vendor instructors inexperienced in RCA method they are teaching

iii. Facility instructors inexperienced in RCA method they are teaching

b. Student quality LTA

i. Students did not volunteer, but were volunteered to participate

ii. Students’ skill sets mismatched for analytical type work

c. RCA Implementation LTA (trained students did not implement properly)

i. Management support systems not in place (no systems/guidance to follow and no oversight to assist)

ii. Analysts too busy to do proper analysis (time pressured/short cuts taken on RCA process)

iii. Too much time lapsed from the training until they were actually applying their new learning in the field

3. RCA Champion Related

a. Executive RCA performance criteria not communicated effectively to RCA Champion

b. Champion did not allocate extra time for analysts to do RCA in the field (they’re too busy being reactive, no time provided to be proactive)

c. RCA recommendations not implemented in a timely manner (or at all)

d. RCA recommendations implemented but not effective (wrong corrective actions)

e. Champion does not help field analysts remove barriers (like getting inter-departmental cooperation)

f. Champion does not provide analysts engineering resources to validate hypotheses (like providing access to a metallurgist to analyze failed parts)

g. Champion does not have time to mentor RCA analysts

4. RCA Executive Expectations Related Issues

a. No RCA expectations set from leadership

i. RCA viewed as a low priority overall

b. RCA expectations set, but not communicated effectively

i. No RCA Champion designated to oversee process, OR

ii. Designated Champion LTA

1. Champion not supported by management, so they are not motivated

2. Selected Champion’s skill sets not a match for the position

c. RCA expectations viewed as unrealistic by Champions

i. Champions and analysts not involved in setting expectations

For those who like to get deep in the weeds about ‘Why RCA Efforts Do Not Meet Expectations’, I invite you to watch this video (~ 15 min) where I did an RCA with a class on ‘Why RCA Fails’.

I also invite you to read an article entitled, ‘Root Cause Analysis vs Shallow Cause Analysis: What’s the Difference?’ for a deeper understanding of where I am coming from. I’d like to hear your feedback on the good, the bad and the ugly of that paper.

KEY TAKEAWAYS/IN CONCLUSION

1. RCA re-work is astronomically expensive, and we shouldn’t put up with it. RCA re-work should be a key metric we track when measuring the effectiveness of our current RCA effort.

2. Chronic failures are significantly more expensive than sporadic failures when viewed from a Total Annual Loss (TAL) perspective.

3. Chronic failures will yield a much quicker and greater return (ROI) if they are the focus of an RCA strategy (akin to Defect Elimination strategies).

4. There are many reasons for why ‘RCAs’ may not be effective. As the quote earlier states, ‘don’t throw the baby (RCA) out with the bathwater’, and just blame RCA in general. I find that in most such cases, it is not the RCA methodology that failed, it is its proper execution that failed. If execution is the problem, it doesn’t matter which RCA approach you pick…it will suffer the same fate!

There are ways to quickly assess the effectiveness of your RCA initiative and I’d enjoy discussing those ways with you. I hope you found value in this content and that the concepts hit a chord with what you see in the field. If you’d like to discuss anything RCA related, just drop me a line via my LI Profile or blatino@prelical.com. If you’re interested in training on these practical approaches, please check out our website at https://prelical.com/services.