A Novel Reason to Use MTBF
Thanks to a reader that noticed my question on why MTBF came into existence, we have a new (new to me at least) rationale for using MTBF. Basically, MTBF provides clarity on the magnitude of a number, because a number in scientific notation is potentially confusing.
What is doubly concerning is the use of MTTF failure rate values in ISO standards dealing with system safety.
Let’s explore the brief email exchange and my thoughts.
First, let me say receiving emails, comments, etc from readers is the highlight of my day. It helps me stay in touch and learn from a wide range of readers. Plus, it is a great way to add more value to my work along the way.
An email arrived that included a nice compliment and quoted my question:
I am not sure why (tend to think it was a marketing decision) someone decided to invert the negative connotation of ‘failures/hour’ into the positive sounding ‘hours/failure’.
Then went on to provide a potential answer:
In Europe we are working with a standard EN ISO 13849 which simplifies safety functions and architectures. If an engineer is not used to calculate with lambda=1E-6 values, than it takes some time to get used. Mainly the mechanical engineer or machinery developer uses MTTF and the standard sets some targets for 30a , 100a etc. I think it is more about not-engineer developer who doesnt want to use exponential numbers and usually at the first sight you dont have a feeling about how big/small an exponential number is.
Overall, this concept of providing a length of an integer is a convenient way to estimate it’s magnitude. This really works well when there are up to about 6 digits. Decerning the different order of magnitudes between a 2 digit number and one with 4 digits is quite easy as one is twice the length of the other.
Large and Small Number Notation
One issue that arises is our inability to quickly decern a 6 and 7 digit number or even more difficult a 9 and 10 digit value.
Scientific notation was in part invented as a convenient way to write very small or very large numbers efficiently. The exponent provides the number of digits or magnitude directly. 2.3e-6 is a small number whereas the number 3.5e-8 is two orders of magnitude smaller.
Here are the two same values converted to MTTF format: 434782.61 and 29411647.71. Which has the lower failure rate? This is a difference between 6 and 8 significant digits and you may have noticed the second number is two digits longer quickly, or maybe not.
Engineering and Numbers
From what I recall and continue to suspect is important, engineers work with numbers. We do calculations. We use formulas. We deal with large and small numbers on a regular basis.
A fundamental understanding of math is a common prerequisite for an engineering sequence of courses.
Making numbers simpler to quickly understand is a great thing, that is why we use scientific notation or other straightforward means to write and use large and small numbers. 0.0000023 becomes 2.3e-6 or 2.3×10-6, which doesn’t manipulate (invert) the number into a large number (6 digits long) instead.
Simplifies Safety Functions and Architectures
This is the most serious concern. Engineers are using an international standard, EN ISO 13849 which ‘simplifies safety functions and architectures’. When our system design require simplication in order to ascertain or design in safety, it just seems wrong to me.
The ability to compare options or quantify safety risk is not facilitated by making that process simpler such that engineers unable to quickly understand large or small numbers by using MTTF.
MTTF is a very poor representation of an item’s time to failure pattern. It masks the nature of the expected or dominant failures tendancy to decrease or increase over time or use. It just uses the average thus losing the often critical information of rate of change over time of the failure rate.
MTTF and MTBF are widely misunderstood and misused.
MTTF or MTBF values used in safety calculations may be provided based on faulty and inaporopiate parts count prediction based methods.
We shouldn’t use (never, never ever) MTTF or MTBF to describe or estimate reliability performance and expecailly not for safety design and operation considerations.
Do the math. Understand the failrue mechanisms and thier chaning rates of occurance over time or use. Let’s not under or over estimate safety risks based on ‘simplied’ methods to deal with small or large numbers.
Alan Pettitt says
Hi Fred
One of the issues I find with using scientific notation is that people get confused with terms like more than and less than and end up using better than which doesn’t necessarily improve clarity. Less of an issue is peoples difficulty the coefficient. Certainly project mangers, etc. have less difficulty judging between to MTBFs much as I rather present failure rates.
Cheers
Alan
Fred Schenkelberg says
How about presenting cumulative distribution functions instead of just a number that is nearly always wrong for the specific decision under consideration?
Andrew Stearns says
Hi Fred! Aside from the numbering appeal, I’ve always seen MTBF as a sort of a benchmark for how well you’ve done your derating. And if that were the true intent, perhaps better methods could be locked in place to level the playing field. But, alas, in my experience MTBF has become just a mere Marketing tool (i.e., get a better number than the competition) and that has led to some interesting interpretations on how to get your custom, Marketing-approved numbers. One of the biggest cheats is with temperature (since a lot of the models fairly or unfairly (and who really knows?) use an Arrhenius temperature factor). So, as an example, consider air temperature 1/2″ away from the surface of a component. Well, is that “still” or “ambient” air? What if we introduce 5000 LFM air flow with a plume of cooler air now 1/2″ from the surface of a component? How much, if at all really, did we improve the chances of survival while inflating the MTBF? Another cheat is with electrical conditions. Maybe in “typical” conditions the stresses are relatively benign, but what if that is not the case with turn-on or turn-off AND the application is expected to see frequent power cycling? Where do you chose to model there? There are a lot of things wrong with MTBF. Maybe in the future, more accurate modeling will provide better insights for expected MTTF, time to x% failure under very specific operating conditions and very specific quality conditions (yet more assumptions).
Fred says
Hi Andrew,
There are better methods, that first off do not use tabulated and outdated lists of component failure rates. The issue you rise around using parts count are valid and apply to all methods of prediction or estimation.
How about just clearing stating assumptions and conditions? How about stating the probability (a number between zero and one so not too small or large for anyone to fuss over), the probability of successful functioning in a stated condition over a stated duration? Let’s use reliability as per our engineering definition.
In general, I find I’m asking for information around what will fail and when is it likely to fail as a way to showcase the inability of MTBF to be useful when discussing reliability.
Cheers,
Fred
Patrick Healy says
The MTTF or MTBF metrics to me hide what is really going on. When we do an FMEDA we have it broken down into Dangerous Failures, Safe Failures etc but within the Dangerous Failures we focus on the Failures that are not detected by online diagnostics or the Dangerous Undetected Failures. These are the ones that allow our customers focus in on their design to see if anything can be done to make these undetected failures, detectable. Turning a systems overall FIT rate into a MTTF or MTBF hides all this lovely details that is actionable.