Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • CMMSradio
    • Way of the Quality Warrior
    • Critical Talks
    • Asset Performance
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Hero
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • The RCA
      • Communicating with FINESSE
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Hardware Product Develoment Lifecycle
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • Your Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
      • FMEA Introduction
      • AIAG & VDA FMEA Methodology
    • Barringer Process Reliability Introduction
      • Barringer Process Reliability Introduction Course Landing Page
    • Fault Tree Analysis (FTA)
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
    • Accendo Reliability Webinar Series
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
Home » Articles » on Maintenance Reliability » Maintenance Management »  How to Make RCFA a Successful Business Improvement Strategy 

by Mike Sondalini Leave a Comment

 How to Make RCFA a Successful Business Improvement Strategy 

 How to Make RCFA a Successful Business Improvement Strategy 

Many companies adopt root cause failure analysis (RCFA) and then drop it. They use it for while and get no benefit. RCFA is often ineffective when used to solve individual problems. But when used to find systematic causes of problems and improve business systems, it provides grand payback for the effort. 

Keywords: root cause failure analysis, business process improvement 

 At an international enterprise asset management and maintenance conference in 2008 the speaker asked the 240 delegates assembled to raise their hands if they had Root Cause Failure Analysis (RCFA) training. In the audience 220 hands went up. The speaker then asked those people whose companies still used RCFA to leave their hands up. Every hand went down. 

What is wrong with Root Cause Failure Analysis? By the evidence from the impromptu sampling at the conference it seems that companies do not consider it worth using. Yet world leaders in industry like DuPont Chemicals, General Electric, Toyota, and other notable businesses, credit part of their operating success to using RCFA. If RCFA made such an important difference to these companies then there is nothing seriously wrong with the methodology itself. It is the reasons why most companies do not get the improvements they want from RCFA that need to be investigated, not the method.

 The Purpose of Root Cause Failure Analysis 



 Figure 1 – Purpose and Use of RCFA

The diagram in Figure 1 shows you when and why RCFA is used. It is intended to address and solve any failure – both specific failure and systemic business failure. Figure 2 indentifies the place for RCA in incident and problem management processes. Root cause analysis is applied in both proactive and reactive situations to identify and address trouble. (By the way, Root Cause Analysis (RCA) and RCFA are the same method. The „F‟ implies equipment failure while RCA encompasses all failures. But the methodology is identical.)

Figure 2 – RCFA Should be Entrenched in Business Improvement Processes

Industrial Accident Triangle and Equipment Failure Triangle 

In 1931 H.W. Heinrich developed the accident triangle after analysing industrial accident data and forever changed the world of safety management. Figure 3 is the updated safety pyramid; following work in 1969 by Frank E. Bird Jr., the Director of Engineering Services for The Insurance Company of America1. The triangle tantalisingly implies a relationship between the number of incidents and the number of serious injuries. The implication being that reducing the large base of incidents will reduce the number of serious injuries. 

Over the intervening years companies proactively focused on reducing the number of hazards that could lead to incidents. There was value in the approach and great reductions in industrial safety incidents occurred, but not so much that serious incidents stopped by an equal proportion. Evidence has accumulated that the accident triangle‟s implication of a direct causal connection between the number of hazards and the possibility for serious accidents is not accurate. Though an association exists, it seems that incidents and serious accidents have different causes2. None-the-less, useful safety improvement definitely occurs when the chance of danger is reduced, and companies continue working to prevent hazards.

Figure 3 – Heinrich Accident Triangle Figure 4 – Ledet Equipment Failure Triangle 

Figure 4, from Winston Ledet of The Manufacturing Game, shows a failure triangle for industrial equipment. It also tantalisingly hints at a relationship between the number of defects in plant and equipment and the likelihood of a serious operational failure. The model is particularly appealing to those of us who have worked with industrial equipment maintenance, as evidence from failures seen over the years with our own eyes supports the failure triangle model. The strength of causality from defect through to serious failure is not known to the Author. 

The apparent similarity between industrial safety accidents and industrial equipment failures is also enticing. In both cases a wide base of uncontrolled risk eventually leads to a disaster. Whether applying to man or machine, the message in both triangles is the same –– small problems left neglected provide opportunity for trouble to arise later. The triangles provide a supporting premise for the tongue-in-cheek Murphy‟s Law –– “Anything that can go wrong will go wrong3” –– by recognising there are many opportunities available for failure to initiate. The similarity between the two triangles also raises the possibility that the principles used to reduce safety accidents also apply to reducing equipment failures. 

For both industrial accidents and equipment failures RCA is used to investigate noteworthy disasters with the aim of pinpointing their cause (or causes). Once the event tree cause and effects are identified, changes are made to prevent loss incidents reoccurring by choosing solutions singly or in combination from the hierarchy of control, such as engineering them out, by segregating the event from producing a disaster, by developing improved procedures and training, and/or by providing additional personal protection.

Why Companies Give-Up on RCFA 

The problem for RCA when investigating individual equipment failures is apparent from the failure triangle – a huge number of causes could have participated in the final loss event. Figure 5 highlights that the 20,000 defects in the failure triangle base could have arisen anywhere during the equipment life-cycle, little of which can be controlled in the operations phase of life. To find exactly the defect(s) that started an incident and identify all combinations of the cause-effect events progressing to the final disaster is a task fraught with numerous mistakes easily and unwittingly made. Even if a root cause defect is removed, 19,999 defects remain to cause unending problems. It is terribly demoralizing to contemplate.

Figure 5 – Defects Can Arise Throughout the Life-Cycle.

The vast number of the possible cause-effect paths, and the near impossibility to prevent continual problems being created by numerous others throughout the life-cycle, eventually makes companies give-up on RCA. They try RCA but the amount of work required, the slow progress and the mountain of remaining problems disheartens people and they start putting their time into finding other solutions. 

RCA is time and resource hungry and users are easily fooled by coincidence, misunderstandings and personal bias. It makes RCA a poor method for business to use to solve individual equipment failures. What RCA does do well is quickly identify business system failures. The process of using RCA works badly for solving individual problems, but it works brilliantly for showing-up black-holes in business processes. 

An example will help explain the dilemma of solving single problems with RCA and show its great worth in detecting business system black holes and procedural failures. Figure 6 is a drawing of the valving to a gas analyser that controlled product quality in a petrochemical facility. During project work Valve 1 was shut instead of Valve 2 and the analyser was accidently isolated. Invalid measurements disrupted production for three hours until the problem was discovered. The RCA that followed consisted of a meeting of busy Operations personnel. They traced the cause to the person who shut the valve not knowing which valve to shut. To address the problem they distributed a ruling that, “Only the Plant Operator is allowed to shut valves.” In reality nothing changed because the current rule was that only plant operators were allowed to operate valves. The chance of the event repeating remained as large as ever. 

Fortunately the incident was used as an exercise in a RCA training course at their site and it was re-examined in greater detail by a mixed team of cross-functional experts. For this small problem twenty two (22) causes over the life-cycle were found to have played a role in the failure. 

Figure 6 – Drawing of Isolation Valves at Gas Analyser 

It was established that the person who shut the wrong valve was drawn into a trap set-up ten years earlier by the people that installed the analyser. When the analyser was installed it should have been connected directly to the main header with dedicated piping. But the side branch with Valve 1 provided easy isolation without stopping production. Hence the quick fix was to leave Valve 1 in the line and fit a tee to the analyser with a new valve, Valve 2, installed next to Valve 1 for isolation. Ten years later Valve 1 was mistakenly shut and brought the operation to a halt and destroyed three hours worth of costly production. 

The RCA found 21 contributing causes of the failure and one main cause – the sample point being connected to the wrong place. No one will fix 22 causes of a problem one-by-one. It is impossible to do so during the operational phase of the life-cycle. Like all of us, the RCA team elected the simple option in the circumstances – tie an engraved tag to each valve explaining its purpose. They did not fix the root cause, but they probably stopped a repeat of the failure because the event path was broken by the addition of new information at a decision point. If the only outcome of this RCA was two new valve tags to protect against the wrong closure of two valves it would be seen as wasteful effort with little benefit. 

Fixing one problem is the least valuable use of RCA. You get maximum protection for the business by taking every RCA solution company-wide. If engraved information tags were fitted on all valves throughout the operation the chance of wrongly closing any valves would be greatly diminished. Now one RCA improves the entire business. One business-wide change removes dozens of future failures, maybe hundreds, and will deliver savings and improved safety for the life of the plant. 

You get the full power of RCA when you improve your business processes with what you learn from each single failure. If we use RCA to solve one cause-effect path we may be lucky and fix one problem for the moment. Even if successful it still leaves all the other possible causes to the failure untouched, and so our problems continue. If instead we fix the business processes that allowed the risk to arise, we fix the problem and we reduce the possibility throughout the business of similar circumstances occurring. Now one RCA investigation improves the entire business forever. This approach to RCA is the most valuable. Use each failure to solve the systematic problems that allowed it. It will not take many RCA‟s before you see marked improvement in your operation‟s performance. Figure 7 points you to the most effective way to use RCA. Propagate the learning by fixing the business systems shown-up by the RCA to be failure-causing. 

Figure 7 – The Power in RCA When You Systematise the Learning in Each Incident

We need to flavour RCA with a new purpose if we are to use it effectively in industry. We must refocus our aim for RCA to one of business-wide improvement and not single problem-solving. A problem-solving focus will keep you immersed in problems forever; to the point that people give-up on RCA because it does not stop problems. If instead RCA is used to fix business processes you get rapid success for the effort because you improve your business systems. The improvements identified by each RCA will flow throughout your business and touch all parts of it. The success rapidly accumulates into higher equipment reliability and greater operating profits. 

Best regards to you, 

Mike Sondalini 

1 Geller., E. Scott., „Psychology of Safety Handbook‟, Edition 2, CRC Press, 2001

2John Booth Davies, John Davies, Alastair Ross, Brendan Wallace, Linda Wright, „Safety Management‟, Taylor and Francis, 2003 

3 http://en.wikipedia.org/wiki/Murphy’s_law 

Filed Under: Articles, Maintenance Management, on Maintenance Reliability

About Mike Sondalini

In engineering and maintenance since 1974, Mike’s career extends across original equipment manufacturing, beverage processing and packaging, steel fabrication, chemical processing and manufacturing, quality management, project management, enterprise asset management, plant and equipment maintenance, and maintenance training. His specialty is helping companies build highly effective operational risk management processes, develop enterprise asset management systems for ultra-high reliable assets, and instil the precision maintenance skills needed for world class equipment reliability.

« Which is Stronger: Outside Pressure to Change? or, Your Internal Drive to Transform

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Headshot of Mike SondaliniArticles by Mike Sondalini
in the Maintenance Management article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  •  How to Make RCFA a Successful Business Improvement Strategy 
  • Which is Stronger: Outside Pressure to Change? or, Your Internal Drive to Transform
  • Does RCM Always Reduce Scheduled Maintenance?
  • Normal Probability Plotting with Case Study
  • Is Extended Warranty a Deceptive Tactic?

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy

Book the Course with John
  Ask a question or send along a comment. Please login to view and use the contact form.
This site uses cookies to give you a better experience, analyze site traffic, and gain insight to products or offers that may interest you. By continuing, you consent to the use of cookies. Learn how we use cookies, how they work, and how to set your browser preferences by reading our Cookies Policy.