Life Data Analysis with only 2 Failures
Here’s a common problem. You have been tasked to peer into the future to predict when the next failure will occur.
Predictions are tough.
One way to approach this problem is to do a little analysis of the history of failures of the commonest or system. The problem looms larger when you have only two observed failures from the population of systems in questions.
While you can fit a straight line to two failures and account for all the systems that operated without failure, it is not very satisfactory. It is at best a crude estimate.
Let’s not consider calculating MTBF. That would not provide useful information as regular reader already know. So what can you do given just two failures to create a meaningful estimate of future failures? Let’s explore a couple of options.
What Information Do We Have Available?
Well, two failures is a start. Of course there are number of questions about those two failures that my provide helpful to have answers.
When did they fail? How long did they operate? This provides just a sketch of time to failure information.
How did they fail? What is the failure mechanism(s)? Maybe there is a time to failure model that describes these failures.
The more we know about the two failures the better we are able to estimate other failures in the population. Speaking of the population, how many elements are in the population? Anything unique about the two failures vs remaining items? How about operating time for all items?
An Analysis Based on Similar Failures
If we know the failure mechanisms and time to failure information we may be able to use existing models or historical knowledge of similar failures to create an estimate of reliability performance. Some may call this a Bayesian approach. Use what you know both statistically and technically to your advantage.
An Analysis Based on a Published Model
Knowing the failure mechanism may permit finding a published model that describes the time to failure pattern. Knowing the time to failure information for the two failures allows using that information to adjust the model to fit the known information.
An Analysis Assuming a Beta Value
If the failure mechanism suggests a particular pattern of failures over time, say a wear out mechanism, we may be able to assume a beta value (for a Weibull distribution). Using the two known failures construct a rough estimate using a point and slope approach.
Condition Monitoring or Degradation Based Approach
Another option, again understanding the nature of the failure mechanisms, along with access to existing unit, we may be able to map the progress toward failure in some fashion for other items. If we have two failed meters for example due to excessive brush wear, we could measure brush wear on a sample of remaining units to create a degradation model and estimate remaining operating life for the population.
Lot’s of if’s here, yet it is an option is the situation fits.
The Least Useful Option — MTBF
Finally, one could, I’m not sure why, one could estimate the total time of operation for the population including the two items that have failed, and calculate MTBF. You would calculate a number, which may be satisfying, yet, as you know, not very useful for any practical purpose.
The more you know about the two failures the better. Ask the questions before fielding your units. Before failures occur. As after failures occur you may not have the range of options available to estimate system reliability.
What have you learned from a couple of failures? How did you treat the information?
Mark Powell says
Usually, if you only have two failures you are talking about a part designed to have a high reliability. This of course means that you expect to have few failures, ever.
So when you asked the question “What Information Do We Have Available?”, you forgot to mention that you may have a ton of suspension or survivor data. There is a wealth of information that can really help define the failure distribution if you can use the survivor data.
I will refer back to the “What’s the Fuss” article on no-MTBF.com for how to best solve this problem that many face.
Fred Schenkelberg says
You are right I did not explicitly talk about the suspensions and how to treat them. I also failed to mention the comparison between what we expected to fail and what did fail as a comparison and feedback to our estimates. If the two failures occurred as expected then our design time estimates are supported (for now). If the failures were surprises (either too few too late, or too many too early, or a different failure mechanism) we learn something and may have to adjust our estimates or ability to gather information concerning failures.