MTBF is Not a Duration
Despite standing for the time between failures, MTBF does not represent a duration. Despite having units of hours (months, cycles, etc.) is it not a duration related metric.
This little misunderstanding seems to cause major problems.
MTBF Calculation From Data
If I have 10 pieces of equipment and they have run for a year, 8,760 hours. And, during that year we enjoyed 5 failures, which were quickly repaired, what is the MTBF of that equipment?
10 units running for 8,760 hours for a total operating time of 87,600 hours. And 5 failures is the only other bit of information need for the calculation. 87,600 divided by 5 is 17,520 hours MTBF.
MTBF, Duration, and Confusion
Of the ten pieces of equipment that each operated for a year and experienced 5 failures, how does a mean time between failures of 17,520 hours remain consistent with the idea (mistaken idea) that we should only have 1 failure every 17,520 hours for each piece of equipment?
Well, it is consistent if you consider we are expecting 1 failure every 17,520 hours, and 17,520 divided by 8,760 hours is 2. Therefore we expect each piece of equipment to have a 50% chance of failure each year. 10 times 50% is 5, which what we experienced.
The confusion occurs when some expect all 10 units to run 2 years and only have one failure. Or that each unit should operate 17,520 hours then have a failure (this is less common to consider MTBF a failure-free period, yet it occurs).
MTBF is an Inverse Failure Rate
Keep in mind that we can consider MTBF as a probability of failure. Unit wise it is an inverse failure rate or the chance of failure per hour.
In the example above we have a 1 in 17,520 chance of failure every hour. Of course, ignoring early life and wear out patterns, which one should never do, btw. More more hours the equipment runs the more times we have a 1 in 17,520 chance of failure. Run for two years you are pretty much certain to have at least one failure.
MTBF does provide a chance per unit (in many cases an hour) of failure, it doesn’t mean the failure rate is accurate or fixed over any period of time we want to use.
In the example above we have data for one year of operation for the 10 units. We do not have information over two years (17,520 hours) nor over 10 years. The MTBF value we calculated only represents a failure rate that is valid for one year. As the equipment breaks-in or wears out, it most likely will be less and less accurate.
MTBF is not all that useful as we rarely encounter a constant failure rate pattern with equipment. Second, MTBF is just a fancy way of representing a failure rate. IT does provide information on the chance of failure per hour per piece of equipment. IT does not suggest the equipment will have a two-year life with no failures or that the equipment will run for two years with only one failure.
MTBF is not all that helpful for many reasons, one is we often work with people that do not understand what MTBF is or is not. MTBF is not a duration it is a probability of failure, that is all.
tim newman says
the CRE BOK have many examples of probability of failure without duration.
this is why it annoys me when i see the definition that reliability is the probability of success, over a period of time, which of course, it is not.
it is the probability of success for a given scenario, which may be time, but might not be.
Marie Ertl says
Much confusion in the calculation…For a constant failure rate, the probability of failure after 1 year is
1-R(1year) or 1-exp(-0.5)= 0.39
so 39% and not 50%. And it decreases the following year…
Remember the exponential distribution is not a normal distribution. We learned in school with a normal that the average or mean is 50%. other distributions, especially skewed distributions do not have an average at the 50th percentile. It will vary.
This is a common confusion with the constant failure rate assumption coupled with our stats knowledge based on the normal distribution.
The math you did is right and if the population has a 50% failure rate over a year, then the probability of failure is as you calculated. Failure rate (or MTBF) is not the same as the probability of failure.
tim newman says
To reiterate. There are a few good examples of non duration reliability in OConnor. He uses cable strength expressed as a mean and standard deviation versus known loads. The stress/strain interference gives the probability of success.
Charles Dibsdale says
If we even consider basic statistics, is the mean, or ‘average’ the most robust measurement of ‘expectation’ (or the ‘middle tendency’) of a distribution of continuous data? The mean is very sensitive to outliers and is not the most robust measurement for non-symetrical or skewed probability distributions. The median measure is more often appropriate. Notwithstanding that, if we are only provided with an MTBF figure and no other data, we have to assume the underlying distribution is exponential, then the mean equates to a 63.2% probability of failure – not as many people assume 50%. For a symmetrical normal distribution both the median and mean is 50%. Why do we persist in using mean, when median is a better choice? Even the Median measure by itself is not enough, we need to know how the data is distributed to understand reliability.