Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • CMMSradio
    • Way of the Quality Warrior
    • Critical Talks
    • Asset Performance
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Hero
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • The RCA
      • Communicating with FINESSE
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Hardware Product Develoment Lifecycle
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • Your Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
      • FMEA Introduction
      • AIAG & VDA FMEA Methodology
    • Barringer Process Reliability Introduction
      • Barringer Process Reliability Introduction Course Landing Page
    • Fault Tree Analysis (FTA)
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
    • Accendo Reliability Webinar Series
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
Home » Articles »  RCFA and 5-Whys Tips for Successful Use 

by Mike Sondalini Leave a Comment

 RCFA and 5-Whys Tips for Successful Use 

 RCFA and 5-Whys Tips for Successful Use 

When you do a Root Cause Failure Analysis or a 5- Why there are no promises that you will actually find the true root cause and fix your problem. Investigating the cause of a failure is fraught with traps, such as making wrong assumptions, insufficient evidence, misinterpreting the evidence, misunderstanding, personal bias and second-guessing. There are necessary issues you need to be aware of that affect the RCA and 5-Why methods, and there are some good practices that you can adopt to improve your chance of doing a successful analysis when applied to equipment failures. 

Keywords: root cause failure analysis, 5-Why analysis 

The life of a failure incident starts at some time and some place in the past. Other than by ‘Acts of God’, industrial accidents and equipment failures are not an accident; they are caused either by human initiated events – lifeless objects do not make choices or action decisions – or by natural physics and bioscience, like corrosion and decay. Study of safety incidents find they happen because a series of circumstances and occurrences across time merge to culminate in the final failure1. There is never just one cause of a failure. It is almost a lie to call an investigation into a failure a Root Cause Failure Analysis – it is more truthful to call it a Random Causes Failure Analysis. Figure 1 points-out the great difficulty of ever finding the root cause(s) of any incident. 

Figure 1 –Failure Causes Can Start Anywhere 

We know that we humans are imperfect. We are limited by the capabilities and capacities of our body and brain designs2. Our muscles tire, we need sleep, our language talents vary, and we differ in mathematical abilities, as do dozens of other attributes and skills. A downside effect of our humanness is that we make human error (Included in the many upsides are our amazing creativity and innovation.). We can make mistakes at any time. Figure 23 lists typical human error rates across a range of activities. It shows the frequency our frailties start failures and disasters; it tells an interesting story of what it means to be human. It is a truth that human error is unavoidable; it is impossible to stop. But that does not mean it must lead to failure.

Figure 2 – Human Error Varies According to the Task Complexity and Situational Stress

Note the list of task types in the table under the ‘Complicated, non-routine task’ heading. That is where most engineering and maintenance work activities sit; they are complicated technical tasks not done often. Their human error rates are massive – at least one error in every ten opportunities to make an error – and it gets worse when stress is added. Human error is the single biggest reason that companies have poor plant and equipment reliability4. Your plant and equipment are fine; they are failed by poor business processes that allow humans to break them. Machines fail because company managers don’t foresee the effects of human error and human factors and do not protect the company from our inbuilt limitations; thus ensuring failure and disaster will eventually occur.

We make matters far worse by designing our machines and business processes to be easily failed by human error. We build them as series configuration of parts and tasks and consequently introduce the problem shown in Figure 3 countless times in our machines and across our companies. Fortunately, the human error rate table also advises us exactly what to do. Note how the sigma quality improves as a task becomes simpler and the work is less complicated. You reduce human error by making a job’s design simple (then simpler), by removing complication, by removing uncertainty, by directing decisions, and by removing causes of physical and mental stress. Everything that you can do to reduce human factor problems will let people do better quality work.

Figure 3 – The Danger of Series Arrangement Designs 

As machines increase in numbers of parts you increase the chance of failure because the series arrangements grow longer, and more parts become available to fail – there are more things to go wrong. Similarly, when business processes have many tasks you provide many opportunities for failure to occur from human error. You will have a constant stream of disasters arriving simply because the probability of failure from countless opportunities is so heavily weighed against you. These never-ending problems eventually burn people out; all because of the stress and fatigue caused by poorly designed series processes throughout our companies and machinery. 

When failures happen, as they inevitably must if people are involved, it is difficult to identify the true cause(s) because many contributing errors will have occurred across the life-cycle of the failed item. In Figure 4 the pump-set fault tree shows that a centrifugal pump can be failed from 553 possible causes. If you did an RCFA on a pump-set breakdown you would have to consider which of the 553 causes occurred to the pump under investigation. Most businesses could never provide the time necessary to conduct that RCFA. Instead, we seek the obvious causes and factors and discard those events considered impossible or too remote to reduce the length of the RCFA. This means that because of process complexity many RCFAs inevitably come-up with the wrong cause and fix the wrong issue, even though we may be convinced that we have found the problem.

Figure 4 – What Caused the Pump Set Failure if there are 553 Ways to Fail a Pump Set?

Use a Consistent and Comprehensive RCFA Process 

We can reduce the number of failed RCFAs if we have a robust RCFA process that every investigative team religiously follows and if we have irrefutable evidence from the failure incident. Figure 5 makes the point that it is the evidence from failed parts that makes clear which of the many possible and diverging paths to the equipment failure caused the incident. If there is no indisputable evidence from a failure incident, then stop the RCFA immediately. Don’t let people waste their time debating opinions that can never be proven and possibly go on to cause pointless grief to others. 

Every company that uses RCFA needs a documented process of how their teams run RCFAs. The procedure will detail how evidence is collected and protected, the team members’ selection process, the responsibilities of the facilitator, the investigative tools and analysis methods to use with examples of best-practice usage, it will provide pro-forma documents, forms and agendas, it will contain criteria to track and monitor the progress of the RCFA, and it will clearly indicate what expenditures are allowed by the team in their efforts to find the truth, along with providing guidance on other issues affecting the success of the RCFA. 

Figure 5 – Only Indisputable Evidence is Acceptable in an RCFA

Use well respected investigative and analysis methods when to doing an RCFA. There are many Total Quality Control and Six Sigma techniques that can be applied to analyse events and historic data. Figure 6 indicates some of the common ones easy to use. 

Most importantly the RCFA must force the team to look far wider for contributing causes than human behaviour normally encourages. We all make assumptions based on what we think we know and believe what our limited human senses ‘tell’ us. This is an important reason why a documented RCFA procedure must be followed – to ensure the team does not fall into the trap of taking a blinkered view from the start. The serial natures of our machinery and business process designs mean there will be numerous life-cycle factors to consider; some stretching back to conception. 

Tools to expand perspectives and de-blinker RCFA team member minds include flow charting the intended design and its behaviour, like that shown in Figure 7 for an overflowing tank and using fishbone diagrams to identify possible influences from various key factors such as measurement, method, machinery, people, materials and environment. These tools are essential for the team to apply at the start if a robust and comprehensive investigation has any chance of occurring. 

When the evidence from the plant and equipment is confusing, or the failure mechanisms involved are poorly understood, it may prove beneficial to conduct a Failure Mode and Effects Analysis (FMEA) on the individual parts involved/affected with the failure to deeply understand the underlying Physics of Failure effects and consequences (i.e. the forces, loads and stresses acting on parts and their effects). Questions about the physical and scientific mechanisms involved with the failure will naturally arise during the FMEA. These questions can then be answered using the evidence available coupled with sound engineering reasoning and materials testing. 

Figure 6 – Contents and Coverage of the RCFA Process
Figure 7 – Start with a Flow Chart of the Failed Process Design to See Risks and Complexity
Figure 8 – Cause-and-effect Diagram Construction with Failure-Sequence Phases

Start from Certain Facts when Building a Cause and Effect Tree 

RCFA has the crazy intention of identifying all possible failure paths and by using the evidence from the incident pinpoint the path that caused the failure. The complexity of business processes and unidentifiable influences across life-cycles makes this a difficult requirement to meet on even simple failures and virtually impossible on disasters. Imagine trying to identify all 553 ways the pump set in Figure 4 could fail? It would be a huge amount of work that people could never do well. Then you would need solid evidence at every step in the cause-effect tree to isolate the true failure cause(s) out of the 553 possibilities. 

Knowing that the design of our machines and businesses easily lead the RCFA investigation astray, the cause-effect diagram that the team constructs need to have a structure that ‘forces’ them to work from known, indisputable evidence back to what may have occurred at the root(s) of the incident. 

Figure 8 recommends that the first phase of an RCFA or 5-Why only consider scientific facts from the evidence to start the cause-effect tree. For example, in Figure 11, the cause-effect tree for the roof collapse from vehicle impact shown in Figure 10 starts from the scientific explanation – the roof fell because cement between the column and foundation sheared, not because the trainer hit the roof. A team may never get to the real root cause, but starting with the scientific causes-and-effects means the RCFA can always come-up with solutions to stop or lessen the consequences of a failure. In this case the use of brick columns with cement joints meant there was no resistance to the tilting caused by the roof moving under the impact. Knowing that, the team can at least propose better choices of construction materials and structural designs that will be more robust in such situations.

Figure 9 – Proving the Actual Failure-Sequence of an Event
Figure 10 – The Roof Collapsed because the Columns Fell, Not because the Trailer Hit the Roof
Figure 11 – Start with the Scientific Sequence of Events

If an indisputable scientific explanation cannot be found the RCFA team should consider stopping because they have only speculation and opinion to work with, which is likely to send the investigation astray and never find the whole truth. Once indisputable physics explains the science of a failure we then try and identify the sequence of physical actions that created the opportunity for failure. Sure, evidence is necessary to confirm our suppositions. The next phase of the fault tree is to find which business systems failed to stop the cascading events. Lastly, we come to latency, which are the inner beliefs, values and norms of the people and organisations involved across the life-cycle of the incident. You may need to go back decades to understand the views and attitudes of people and company culture. 

The actual failure path(s) needs to be proven true. That is only possible if there is unquestionable evidence for each cause-effect step, which becomes less likely to exist as the fault tree ‘grows’ towards its roots. The ‘incident actions’ and ‘latent causes’ phases, where people need to tell the absolute truth about themselves and others, are often short of tangible proof. 

Using 5-Why Methodology Rightly 

The 5-Why methodology is well structured for confirming a failure path once a cause-and-effect tree is drawn. It is a poor method for identifying the cause-and-effect tree. It is doubtful that simply by asking ‘why’ five times you can find the root cause of an incident with high degree of certainty. ‘5-Why’ is just a tag to name the method, it may take three, seven, or ten ‘whys’ to get to what may be a speculative root. Just because you can answer a ‘why’ question does not prove the answer is right. This is the great trap with using 5-Why; people think they will unearth the full truth with the methodology. As soon as a fault tree splits into contributing causes the 5-Way method fails as a robust, stand-alone analysis tool. But when used to confirm the failure path from the presence of real evidence, as shown in Figure 9, the method is universally useful. 

If 5-Why is used, you need to include a means to test each cause-and-effect step and prove the answer to the ‘why’ question with facts. This is the purpose of the 3W2H set of additional questions – With what, When, Where, How, and How much – that need to be used in combination with the 5-Why method.

Figure 12 – Why-Tree of a Despatch Process Failure
Figure 13 – Seeking Understanding of Incident Latency Drivers

Figures 12 and 13 are a simple cause-and-effect tree from the physical evidence to the latent causes of an incident.

Figure 14 – A 5-Why Record Form Must Show Sure Cause-Effect Evidence

Figure 14 uses a 5-Why Table to confirm the failure path with factual evidence. The failure was a late delivery to a client who invoked a $25,000 penalty clause. The RCFA team was charged with understanding what happened and why, and to prevent the problem in future. 5-Why was used to confirm the fault tree; not to develop it. 

RCFA Does Not Solve Problems 

Companies expect RCFA to solve their problems, but that is an impossible expectation. The output of every RCFA or 5-Why is a report. They only produce paper. They do not solve or stop the actual failure. Future failures can only be stopped or lessened by implementing the changes recommended by the RCFA or 5-Why. You must take the ideas from the investigation and do them in the real world. The written recommendations start the improvement process, but to cause them to happen they need a separate project that the organisation funds and implements. 

The function of RCFA and 5-Why is to come-up with answers and does not include implementing the answers. RCFA stops once the report is presented. After delivering the report other business processes must take the recommendations to completion. Otherwise, there will be plenty of RCFA reports produced by teams, but nothing will change to improve the organisation. Doing the RCFA is the easy 20% of improving a business process. The hard yards come after the report.

Figure 15 – Implement RCFA Outcomes using Change Management and Project Methodology

The process that a company uses to implement RCFA recommendations needs to be identified in the RCFA Procedure document so everyone knows what will happen to the RCFA output. The RCFA recommendations need to be taken into a project management and change management process that cover the requirements shown in Figure 15. 

RCFA and 5-Why methodology can help improve organisations if people care to know the truth and then act appropriately to resolve the ‘human element’ issues and remove the ‘black-holes’ in their business processes that draw their people into certain failure. 

Mike Sondalini 



 1 Hopkins, Andrew., ‘Safety, Culture and Risk – the organisational causes of disasters’, Forward by James Reason, CCH Australia, 2005

2 Gladwell, Malcolm., ‘Blink, the power of thinking without thinking’, Back Bay Books, 2005 

3 Smith, David J., ‘Reliability, Maintainability and Risk’, Appendix 6, Seventh Edition, Elsevier – Butterworth Heinemann 

4 Barringer, H. Paul, P.E. ‘Use Crow-AMSAA Reliability Growth Plots To Forecast Future System Failures’, Barringer and Associates, Humble, TX, USA, www.barringer1.com

Filed Under: Articles, Maintenance Management, on Maintenance Reliability

About Mike Sondalini

In engineering and maintenance since 1974, Mike’s career extends across original equipment manufacturing, beverage processing and packaging, steel fabrication, chemical processing and manufacturing, quality management, project management, enterprise asset management, plant and equipment maintenance, and maintenance training. His specialty is helping companies build highly effective operational risk management processes, develop enterprise asset management systems for ultra-high reliable assets, and instil the precision maintenance skills needed for world class equipment reliability.

« Higher Education Disruption
Historical Data »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Headshot of Mike SondaliniArticles by Mike Sondalini
in the Maintenance Management article series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Posts

  • The Hidden Challenges of Agile in Hardware Development
  • Statistical Tools most Frequently used During Product Validation.
  • The Challenges in Reliability Engineering
  •  How to Make RCFA a Successful Business Improvement Strategy 
  • Which is Stronger: Outside Pressure to Change? or, Your Internal Drive to Transform

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy

Book the Course with John
  Ask a question or send along a comment. Please login to view and use the contact form.
This site uses cookies to give you a better experience, analyze site traffic, and gain insight to products or offers that may interest you. By continuing, you consent to the use of cookies. Learn how we use cookies, how they work, and how to set your browser preferences by reading our Cookies Policy.