Why do so many avoid failure?
In product development of plant asset management, we are surrounded by people that steadfastly do not want to know about or talk about failures.
Failure does happen. Let’s not ignore this simple fact.
The blame game
Unlike a murder mystery, failure analysis is not a game of whodunit.
The knee-jerk response to blame someone rarely solves the problem nor creates a reliability-minded workplace.
If the routine is to blame someone, when a failure is revealed, fewer people will reveal failures.
If it is clear we do not want to talk about failures in a civilized manner, well, we’ll just not talk about failures.
Failures will still occur.
The blame centric organization will have the majority of people that could understand and solve problems, simply turn and avoid ‘seeing’ failures.
When friends and colleagues are vilified in order to ‘solve problems’, it’s not safe to recognize failures.
Root cause analysis
This is one step in the failure analysis process, yet critical to get right.
The basic idea is to understand the fundamental (molecular, physics, chemistry, material property) level of the circumstances and events leading to failure.
We should be able to reproduce at will the issue and turn off or avoid the failure at will. Then we understand the root cause.
Techniques like “5 Why’s” provide a framework to ensure we understand the cause of failure.
Equipment from magnifying lens to scanning electron microscopes help us ‘see’ the physical and chemical clues.
The failure analysis process
The 8 disciplines (8D) is a common FA process. There are many variations, yet the pattern tends to remain the same.
Upon initial recognition of a failure. Gather information, symptoms, and circumstances.
And, if needed implement any emergency response required (I.e. Fire, first aid, chemical spill containment, etc.)
Form a team. This can be just a couple of people or a formal multi discipline team depending on the magnitude of the failure and associated consequences.
Describe the problem. What is and is not known.
The more detail and facts here the better.
Immediate response and containment. Isolate the batch, stop shipments of suspect products, etc.
Limit the occurrence of additional failures if at all possible. If there is an immediate workaround or patch, use that to mitigate and avoid failures.
This is not the solutions, just a stop gap action.
Root cause analysis
This is the sleuthing part, not who to blame, rather determine what actually happened at a fundamental level.
One piece of advice, do not send suspect components to suppliers or vendors for FA work.
It takes too long and rarely results in a meaningful RCA. Instead, use internal or contracted FA labs.
Sure it may cost more to get the analysis, yet will be quicker and clearer.
Corrective Action only once armed with a fundamental understanding of the root cause.
This may include a design, material or process change.
Test the solution and verify that it actually works.
Monitor as long a necessary to validate the solution provides a fundamental resolution.
Based on what the team learned, what can we as an organization learn to avoid similar issues in the future?
This is often the most difficult step. Step back from the immediate problem and review the processes in design and production that created a situation where the failure occurred.
This is not the step to add more controls and checks, rather the step to assess the process and improve our ability to make better decisions in the future.
For example, if the root cause for a material defect is the use of an unstable additive, then simply concluding that we list that additive to a ‘do not use’ list is short sighted.
Instead what part of the process should have revealed the faulty material choice? Why was the stability question not asked earlier in the process?
Was it a lack of resources, or the team’s focus on time to market?
What system structure blinded us to identify the issue earlier?
Learn from the failure, not only how to resolve the immediate issue, instead learn how to avoid making similar mistakes in the future.
Summary
Every organization has stories about failures. Especially organizations that ‘do not talk about failures’.
Failures happen, and when they do we can learn and improve our organization.
So, what are your failure stories?
Share one in the comments or send me a note directly.
I’ll gather the best stories, sanitize to avoid deriding any organization and post the best failure ‘horror stories’ on Halloween (Oct 31st).
Related:
When to Take Action on Field Failure Data (article)
Field Data and Reliability (article)
The Next Step in Your Data Analysis (article)
Gene Danneman says
Failure Reporting and Corrective Action System (FRACAS) is an excellent tool to manage failure mitigation.
https://en.wikipedia.org/wiki/Failure_reporting,_analysis,_and_corrective_action_system
Fred Schenkelberg says
Hi Gene, FRACAS certainly is a good framework to manage failures (if not a blame approach), thanks of the comment and link. cheers, Fred