Is it possible to foresee all reliability issues before a product launch?
I don’t think so. Can we minimize surprises from field failures?
The number of potential failures is often unknown.
With a little experience and imagination, we can list potential failures including those
- We imagine as possible when making design decisions
- We experience with the first prototypes
- We imagine our customer will experience
- We discover with product testing
- We hear about from our customers
Being Surprised by what we didn’t know
I remember working with a product development team and just after the final design review when the product was approved for shipment, a co-worker said, “I wonder what failure we missed?”
I was a bit surprised as we did a lot of work with the design team, supply chain, and manufacturing to uncover and resolve thousands of potential failure modes. It was an exhaustive process and finding failures in the last month of the project became the search for exceedingly rare events.
This was my first product launch.
My friend had worked with dozen’s of projects and said despite the effort to find all the possible failure modes, we missed one or more major issues. He told me a story about a polymer part that bloomed plasticizer under normal operating conditions which rendered the product unusable. High and low use cases or stress inhibited the failure mechanism form occurring. Running at user conditions for about 2 months, a luxury we did not enjoy, was the best accelerant. No one had imaged the failure mechanism, so we didn’t even know to look for it during our evaluations.
Effect of small sample size
We were keenly aware of single events during our testing may result in larger than expected failure rates in actual use.
We often evaluated just a few samples with a specific test condition. Thus is only one unit out of 10 showed a fault, that could represent a very minor possibly of field failure or a 2 or 3% failure rate issue (the expected first-year failure rate for all causes was only 2% – thus a single event this high would likely make the product unprofitable).
Intermittent issues found presented a unique issue. Part of the failure analysis process was the ability to replicate the issue. If we understood the issue well enough to cause it to happen, we generally had a way to determine if a fix actually worked or not. Not the best course of action, yet it was common practice.
If an issue only occurred once in 1,000 attempts, how many times would you try before you determined you were not able to replicate the issue? You most likely didn’t know the frequency of occurrences nor the specifics of the failure mechanisms and what tripped the fault state.
Every failure was a gift. We treasured each gift and learned as much as we could with each event.
How reliable is reliable enough?
Even with safety critical systems and no time pressure to start the system, at some point, we have to say it’s good enough.
We take a risk with every product launch or asset commissioning.
I learned that reliability is not the only concern or risk, it’s one of many. There is pressure to bring features to the market, to take advantage of buying patterns (holiday sales, for example), to recoup the investment in product development.
During development, we have to balance the risks of failures with the ability to stay in business.
This risk is often difficult to articulate as we do not know the magnitude of every possible failure mechanism.
A basic approach to minimize field failures
Every product development project and market is different. There is no one path to achieving the minimum field failure rate short of not shipping any products.
There are a few basic concepts that do appear to help though. The organizations that enjoy very low field failure rates tend to be proactive in their approach. Here a few concepts to consider when bringing a new system online or to market:
- Have a clear reliability goal and acceptable go to market limit
- Create a reliability plan and plan to be surprised by failures
- Treasure every failure for the knowledge it can provide
- Remember that margins erode with familiarity
- Challenge assumptions every day
- Listen to your customers and match their expectations with your products performance
- Anticipate failures and retire the ‘fire department’
- Reliability is in every decision and across the entire organization
- Measure reliability early and often using a wide range of techniques
The best organizations enjoyed low field failure rates. They did have field failures. The difference between the best and worst organization, was the best deliberately understood the risks and rarely were surprised by a new failure mode discovered by customers.
The best organizations work to create a product that is good enough.