A reliable product does not fail often. Customers expect to a level of reliability performance and failures that occur too early dash their expectation.
The design and development team work to create a robust product. To meet the customer’s reliability expectations. The team may use a range of tools to detect any reliability problem prior to launch.
Yet, customers do uncover problems that surprise us. This may be a problem with how we identify and resolve risks, yet it could also be the development process didn’t look close enough to find the issues. Or, worse, we saw the issue and ignored it.
Received this question last week asking about ways to improve finding issues (and solving them) before customer find them.
So, here is my question to you, if you wouldn’t mind shedding some light on the subject…is it common place to not be able to predict all that could happen? Or are their truly methods that can prevent this? My experience in micro-electronics is that no one has deep enough pockets or long enough patience to flush out all problems. I’m open to a better way of doing this and a better way of thinking.
Thanks for the note and question.
Keep in mind that FMEA is only one way to undercover what you and team do not know — lurking failure mechanisms. There are approaches I’ve seen and they are not perfect, yet can help in most cases, including those brand new inventions.
First — with the FMEA approach include those items that you know you don’t know. If not sure about a particular failure mechanism, if or to what extent, something will fail, then include that and tag it for exploration and future study. The other outcome for FMEA is to improve the monitoring capabilities to detect those elements that may happen and are not currently visible (latent defects, age-related defects, etc.)
Second — for truly novel items without any history extend the bench and environmental testing to find the margins. Cause failures and are they as expected. For expensive items use modeling to estimate the stress levels or accumulated damage required to cause failure and run the experiments to verify. Yes, this can be expensive, yet when the uncertainty or modeling suggests a 1% or higher failure rate, we should invest to know the full story.
Third — supply chain risk analysis and process control. This generally isn’t taken too seriously and adds a great amount of risk of your product failing. Again we do not have the time or resources to cover everything, yet if purchasing is driving cost and not quality/reliability that team needs adjusting. I find that a third or more of field issues come from supply chain related root causes. As you know this root cause can become quite a serious outbreak of failures.
Forth — record all prototype (or simulation) failures. Plot cumulative failures over time and monitor the slope. If it is remaining steep you are not done finding issues, if the plot rolls over to a more horizontal line (meaning it is taking longer and longer to find the next issue) then we’ve found all that we’re capable of finding. Regularly review the testing and evaluation methods to look for ways to expand the coverage and range of stresses to help find issues faster and more of them.
Fifth — keep the team aware of past failures in a systemic way. A checklist or database generally do not work. While at HP I learned about a simple process and wrote about it as an example of a role of the quality and reliability manager within an organization.
One of the biggest factors I find is the organizations that celebrate failures as a gift tend to find and resolve more issues before shipment. Those that reward the hero that quickly solves a field issue, continue to ship products with field issues. If the culture during development is to blame the engineer or technician for the failure rather than understand the root cause and decision process leading the failure really miss an opportunity to improve their product.
Thanks again for the question. Not sure my answer is of much help. Let me know your thoughts.