Perfect Reliability

Last Verified September 27, 2021

Is it possible to foresee all reliability issues before a product launch?

No.

I don’t think so. Can we minimize surprises from field failures?

Yes.

The number of potential failures is often unknown.

With a little experience and imagination, we can list potential failures including those

We imagine as possible when making design decisions
We experience with the first prototypes
We imagine our customer will experience
We discover with product testing
We hear about from our customers

Being Surprised by what we didn’t know

I remember working with a product development team and just after the final design review when the product was approved for shipment, a co-worker said, “I wonder what failure we missed?”

I was a bit surprised as we did a lot of work with the design team, supply chain, and manufacturing to uncover and resolve thousands of potential failure modes. It was an exhaustive process and finding failures in the last month of the project became the search for exceedingly rare events.

This was my first product launch.

My friend had worked with dozen’s of projects and said despite the effort to find all the possible failure modes, we missed one or more major issues. He told me a story about a polymer part that bloomed plasticizer under normal operating conditions which rendered the product unusable. High and low use cases or stress inhibited the failure mechanism form occurring. Running at user conditions for about 2 months, a luxury we did not enjoy, was the best accelerant. No one had imaged the failure mechanism, so we didn’t even know to look for it during our evaluations.

Effect of small sample size

We were keenly aware of single events during our testing may result in larger than expected failure rates in actual use.

We often evaluated just a few samples with a specific test condition. Thus is only one unit out of 10 showed a fault, that could represent a very minor possibly of field failure or a 2 or 3% failure rate issue (the expected first-year failure rate for all causes was only 2% – thus a single event this high would likely make the product unprofitable).

Intermittent issues found presented a unique issue. Part of the failure analysis process was the ability to replicate the issue. If we understood the issue well enough to cause it to happen, we generally had a way to determine if a fix actually worked or not. Not the best course of action, yet it was common practice.

If an issue only occurred once in 1,000 attempts, how many times would you try before you determined you were not able to replicate the issue? You most likely didn’t know the frequency of occurrences nor the specifics of the failure mechanisms and what tripped the fault state.

Every failure was a gift. We treasured each gift and learned as much as we could with each event.

How reliable is reliable enough?

Even with safety critical systems and no time pressure to start the system, at some point, we have to say it’s good enough.

We take a risk with every product launch or asset commissioning.

I learned that reliability is not the only concern or risk, it’s one of many. There is pressure to bring features to the market, to take advantage of buying patterns (holiday sales, for example), to recoup the investment in product development.

During development, we have to balance the risks of failures with the ability to stay in business.

This risk is often difficult to articulate as we do not know the magnitude of every possible failure mechanism.

A basic approach to minimize field failures

Every product development project and market is different. There is no one path to achieving the minimum field failure rate short of not shipping any products.

There are a few basic concepts that do appear to help though. The organizations that enjoy very low field failure rates tend to be proactive in their approach. Here a few concepts to consider when bringing a new system online or to market:

Have a clear reliability goal and acceptable go to market limit
Create a reliability plan and plan to be surprised by failures
Treasure every failure for the knowledge it can provide
Remember that margins erode with familiarity
Challenge assumptions every day
Listen to your customers and match their expectations with your products performance
Anticipate failures and retire the ‘fire department’
Reliability is in every decision and across the entire organization
Measure reliability early and often using a wide range of techniques

The best organizations enjoyed low field failure rates. They did have field failures. The difference between the best and worst organization, was the best deliberately understood the risks and rarely were surprised by a new failure mode discovered by customers.

The best organizations work to create a product that is good enough.

Fred Schenkelberg says

June 27, 2016 at 6:04 PM

Hi Rob,

Thanks and great questions. I’ll most likely have to write about the best way to provide feedback, or to monitor fielded equipment, and how – maybe an entire series of articles as it’s a big topic.

The answer of course really depends on your product and how it works. NEST home thermostats wasn’t a big company, yet the product, by the way, it operates it connected to a company server. Other products go to a customer and are never heard from again, as upon failure it’s worth the effort to report it (for some products, not all).

Passenger cars for years, as well as ink jet printer cartridges collected information on use rates, temperatures, etc. And when returned or recycled or serviced one could collect the data for analysis. The cost of sensor and memory has made this approach viable.

The best feedback to design teams is actionable feedback. Not a count of monthly failures, rather detailed failure analysis specifically identifying the failure mechanism(s). Symptoms or replaced parts helps yet only might narrow down the true problem.

The other feedback if you can get it, is use stresses, frequency, customer expectations, etc. What is the actually set of stresses around not only failed products, rather all products.

We try to do this prior to production, yet we should plan to collect and gather actionable data once in the hands of customers.

Yep, this is going to make a nice series of posts – be sure to join Accendo so you don’t miss any of the upcoming articles (if you’re not a member already).

Cheers,

Fred

Comments

Robert Drummond says
June 27, 2016 at 5:25 AM
Hi Fred,
Another great article. In your opinion what is the best mechanism to have a feedback loop from Field Failures to Product Design Team?
Whilst many big companies have the luxury of monitoring their products in field, how can small companies monitor their products in the field to minimise failure rate/future product development?
What type of field product monitoring mechanisms would you suggest?
Thanks again for the great article.
Rob
- Fred Schenkelberg says
  June 27, 2016 at 6:04 PM
  Hi Rob,
  Thanks and great questions. I’ll most likely have to write about the best way to provide feedback, or to monitor fielded equipment, and how – maybe an entire series of articles as it’s a big topic.
  The answer of course really depends on your product and how it works. NEST home thermostats wasn’t a big company, yet the product, by the way, it operates it connected to a company server. Other products go to a customer and are never heard from again, as upon failure it’s worth the effort to report it (for some products, not all).
  Passenger cars for years, as well as ink jet printer cartridges collected information on use rates, temperatures, etc. And when returned or recycled or serviced one could collect the data for analysis. The cost of sensor and memory has made this approach viable.
  The best feedback to design teams is actionable feedback. Not a count of monthly failures, rather detailed failure analysis specifically identifying the failure mechanism(s). Symptoms or replaced parts helps yet only might narrow down the true problem.
  The other feedback if you can get it, is use stresses, frequency, customer expectations, etc. What is the actually set of stresses around not only failed products, rather all products.
  We try to do this prior to production, yet we should plan to collect and gather actionable data once in the hands of customers.
  Yep, this is going to make a nice series of posts – be sure to join Accendo so you don’t miss any of the upcoming articles (if you’re not a member already).
  Cheers,
  Fred
  - Hassaan Nasir says
    April 2, 2020 at 7:20 AM
    Fred sounds like a good topic to discuss in webinar
    - Fred Schenkelberg says
      April 2, 2020 at 12:38 PM
      Agreed – thanks for the suggestion. cheers, Fred

Being Surprised by what we didn’t know

Effect of small sample size

How reliable is reliable enough?

A basic approach to minimize field failures

About Fred Schenkelberg

Comments

Leave a Reply Cancel reply