Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
    • Asset Reliability @ Work
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by Fred Schenkelberg 2 Comments

Failure Analysis: The Key to Learning From Failure

Failure Analysis: The Key to Learning From Failure

Why do so many avoid failure?

In product development of plant asset management, we are surrounded by people that steadfastly do not want to know about or talk about failures.

Failure does happen. Let’s not ignore this simple fact.

The blame game

Unlike a murder mystery, failure analysis is not a game of whodunit.

The knee-jerk response to blame someone rarely solves the problem nor creates a reliability-minded workplace.

If the routine is to blame someone, when a failure is revealed, fewer people will reveal failures.

If it is clear we do not want to talk about failures in a civilized manner, well, we’ll just not talk about failures.

Failures will still occur.

The blame centric organization will have the majority of people that could understand and solve problems, simply turn and avoid ‘seeing’ failures.

When friends and colleagues are vilified in order to ‘solve problems’, it’s not safe to recognize failures.

Root cause analysis

This is one step in the failure analysis process, yet critical to get right.

The basic idea is to understand the fundamental (molecular, physics, chemistry, material property) level of the circumstances and events leading to failure.

We should be able to reproduce at will the issue and turn off or avoid the failure at will. Then we understand the root cause.

Techniques like “5 Why’s” provide a framework to ensure we understand the cause of failure.

Equipment from magnifying lens to scanning electron microscopes help us ‘see’ the physical and chemical clues.

The failure analysis process

The 8 disciplines (8D) is a common FA process. There are many variations, yet the pattern tends to remain the same.

Upon initial recognition of a failure. Gather information, symptoms, and circumstances.

And, if needed implement any emergency response required (I.e. Fire, first aid, chemical spill containment, etc.)

Form a team. This can be just a couple of people or a formal multi discipline team depending on the magnitude of the failure and associated consequences.

Describe the problem. What is and is not known.

The more detail and facts here the better.

Immediate response and containment. Isolate the batch, stop shipments of suspect products, etc.

Limit the occurrence of additional failures if at all possible. If there is an immediate workaround or patch, use that to mitigate and avoid failures.

This is not the solutions, just a stop gap action.

 

Root cause analysis

This is the sleuthing part, not who to blame, rather determine what actually happened at a fundamental level.

One piece of advice, do not send suspect components to suppliers or vendors for FA work.

It takes too long and rarely results in a meaningful RCA. Instead, use internal or contracted FA labs.

Sure it may cost more to get the analysis, yet will be quicker and clearer.

Corrective Action only once armed with a fundamental understanding of the root cause.

This may include a design, material or process change.

 

Test the solution and verify that it actually works.

Monitor as long a necessary to validate the solution provides a fundamental resolution.

Based on what the team learned, what can we as an organization learn to avoid similar issues in the future?

This is often the most difficult step. Step back from the immediate problem and review the processes in design and production that created a situation where the failure occurred.

This is not the step to add more controls and checks, rather the step to assess the process and improve our ability to make better decisions in the future.

For example, if the root cause for a material defect is the use of an unstable additive, then simply concluding that we list that additive to a ‘do not use’ list is short sighted.

Instead what part of the process should have revealed the faulty material choice? Why was the stability question not asked earlier in the process?

Was it a lack of resources, or the team’s focus on time to market?

What system structure blinded us to identify the issue earlier?

Learn from the failure, not only how to resolve the immediate issue, instead learn how to avoid making similar mistakes in the future.

Summary

Every organization has stories about failures. Especially organizations that ‘do not talk about failures’.

Failures happen, and when they do we can learn and improve our organization.

So, what are your failure stories?

Share one in the comments or send me a note directly.

I’ll gather the best stories, sanitize to avoid deriding any organization and post the best failure ‘horror stories’ on Halloween (Oct 31st).


Related:

When to Take Action on Field Failure Data (article)

Field Data and Reliability (article)

The Next Step in Your Data Analysis (article)

 

 

Filed Under: Articles, Musings on Reliability and Maintenance Topics, on Product Reliability Tagged With: Failure, root cause

« The Liability Part of Reliability Engineering
Warranty Evolution and Laws »

Comments

  1. Gene Danneman says

    October 22, 2015 at 1:14 PM

    Failure Reporting and Corrective Action System (FRACAS) is an excellent tool to manage failure mitigation.
    https://en.wikipedia.org/wiki/Failure_reporting,_analysis,_and_corrective_action_system

    Reply
    • Fred Schenkelberg says

      October 22, 2015 at 1:23 PM

      Hi Gene, FRACAS certainly is a good framework to manage failures (if not a blame approach), thanks of the comment and link. cheers, Fred

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Article by Fred Schenkelberg
in the Musings series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Risk Prioritization in FMEA – a Summary
  • What Are Best Practices for Facilitating Qualitative Assessments?
  • So, What’s Still Wrong with Maintenance
  • Foundation of Great Project Outcomes – Structures
  • What is the Difference Between Quality Assurance and Quality Control?

© 2023 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy

This site uses cookies to give you a better experience, analyze site traffic, and gain insight to products or offers that may interest you. By continuing, you consent to the use of cookies. Learn how we use cookies, how they work, and how to set your browser preferences by reading our Cookies Policy.