
Root Cause Analysis (RCA) is the step that turns failure into learning.
Corrective and preventive actions are only effective if they address the real causes of failure, not just the symptoms. RCA provides the structured approach needed to understand why the system allowed the failure to occur in the first place.
RCA is often associated with specific tools such as fishbone diagrams, the “5 Whys”, or fault trees. Those tools can be useful, but they are not the purpose of RCA.
At its core, RCA is about understanding why a failure occurred and identifying changes that will meaningfully reduce the likelihood of it happening again
Going beyond the obvious
A common failure of RCA is stopping too early. In practice, investigations often conclude with causes such as:
“Component failure”
“Human error”
“Procedure not followed”
These may describe what happened, but they rarely explain why the system allowed it to happen.
Effective root cause analysis looks beyond the immediate event and considers the wider system, including:
- Design assumptions and margins
- Interfaces between systems, teams, or suppliers
- Operating context and workload
- Maintenance practices, training, and information
- Organisational pressures and incentives
Structured approaches such as cause-and-effect diagrams (Ishikawa), the 5 Whys, or the 8D problem-solving method can help teams explore these interactions in a systematic way.
Identifying actionable causes
From a reliability engineering perspective, the goal of RCA is not simply to find a root cause.
It is to identify actionable causes, conditions that if addressed, will reduce risk in a meaningful way.
This is also where RCA and CAPA intersect. A well-conducted RCA informs effective corrective and preventive actions. A weak RCA leads to superficial fixes, repeated failures and growing frustration.
Importantly, RCA is not about assigning blame. When investigations focus on individuals rather than system conditions, learning is limited and trust is eroded, making future issues harder, not easier, to surface.
Learning that improves the system
The CRE Body of Knowledge treats RCA as a foundation for learning and improvement, not a compliance exercise. Good RCA doesn’t guarantee failures won’t recur, but poor RCA almost guarantees they will.
Understanding failure is only part of the challenge, however. Reliability engineering must also consider how systems recover when failures inevitably occur.
Next up…
Reliability Bites #16: Maintainability and availability – designing for recovery, not just avoidance.
Ask a question or send along a comment.
Please login to view and use the contact form.
Leave a Reply