When facing yet another field issue with a high price tag, my Chief Technical Officer asked me, “How do we get more predictive so we can identify and prevent these failures from occurring in the future?” Similarly, I had a friend who was trying to optimize a key customer feature of a future product. He ran robustness experimentation considering over 40 noise and control factors that the team had brainstormed. And yet, when field trials started, the device had several failures of unknown cause. Of the more than 40 factors that they had considered during brainstorming, they missed the noise factor that was triggering these failures. I’ll turn to you and ask the same question: How do we get better at predicting future failures and preventing them from occurring? If we had infinite knowledge, we could see these failures before they occurred.
So, how do we get better at predicting these failures within our limited budgets and our time and personnel constraints? In my experience, there are three key factors in predicting future failures:
- The right knowledge – having the right team members at a given brainstorm
- The right tool – Having a tool that helps the team visualize and brainstorm to a given problem
- Focused effort on root causing and solving the problem
In this article I’ll address number 2, “the right tool”. In both of my examples above, the team missed the one key “noise” factor that was responsible for the given failure. Usually, there is no shortage of ideas for the causes of failures with a group of engineers as each engineer enters the room with their personal cause theories. What is needed are the details of the failure and the right tool to capture it. For me, the best tool for capturing the cause theories is the Parameter (P) Diagram because it helps the team to both frame the problem and visualize it:
Using an example from my previous article we find that the failures associated with Module A have a Weibull slope of 0.81 as the following illustrates:
In this example, we’ll say that the R&R engineer assigned to the team identifies that the failure is from one failure mode. And because the Weibull slope is close to 1 but under 1 (i.e., 0.81) the failure is likely due to a stress condition, manufacturing issue or combination of the two. Because of this knowledge, the team can narrow their brainstorming efforts to over-stress conditions, manufacturing issues or a combination of the two.
In the P Diagram, the team would record the failure mode exhibited in the field in the “Error State” along with all of the specifics known about the given failure. Secondly, the team would brainstorm their cause theories for the given failure under “Noises” and then use the P-diagram to force rank and test out suspected cause theories. To close the loop on the original question posed at the beginning of the article, “How do we prevent these failures from occurring in the future?” One way is to capture the P-diagram and the resulting experiments and development tests used to replicate/root cause the failure in a library of module P-diagrams so that the next team designing and developing that given module isn’t starting from scratch but rather from the lessons learned on real field issues of the past.
Gary Bolla says
Hello Jim! Nice to see you are active in reliability.
John Martz says
Hi Gary,
Yep – Jim is in the reliability space and I’m in the robustness space now… and combining the two with Quality and helping companies in all 3. Let Jim or me know if you would like to add robustness to your reliability group in the form of training or consulting.