
Despite standing for the ‘time between failures’, MTBF does not represent a duration. Despite having units of hours (months, cycles, etc.), it is not a duration-related metric.
This little misunderstanding seems to cause major problems. [Read more…]
Your Reliability Engineering Professional Development Site

Despite standing for the ‘time between failures’, MTBF does not represent a duration. Despite having units of hours (months, cycles, etc.), it is not a duration-related metric.
This little misunderstanding seems to cause major problems. [Read more…]

Our customers, suppliers, and peers seem to confuse reliability information with MTBF. Why is that?
Is it a convenient shorthand? Maybe I’m the one confused, may those asking or expecting MTBF really want to use an inverse of a failure rate. Maybe they are not interested in reliability.
MTBF is in military standards. It is in textbooks, journals, and component data sheets. MTBF is prevalent.
If one wants to use an inverse simple average to represent the information desired, maybe I have been asking for the wrong information. Given the number of references and formulas using MTBF, from availability to spares stocking, maybe asking for MTBF is because it is necessary for all these other uses. [Read more…]

MTBF is a symptom of a bigger problem. It is possibly a lack of interest in reliability. Which I doubt is the case. Or it is a bit of fear of reliability.
Many shy away from the statistics involved. Some simply do not want to know the currently unknown. It could be the fear of potential bad news that the design isn’t reliable enough. Some do not care to know about problems that will require solving.
Whatever the source of the uneasiness, you may know one or more coworkers who would rather not deal with reliability in any direct manner.
[Read more…]

To me it means very little, as it rarely occurs. Products fail for a wide range of reasons and each failure follows it’s own path to failure.
As you may understand, some failures tend to occur early, some later. Some we call early life failures, out-of-box failures, etc. Some we deem end-of-life or wear-out failures. There are a few that are truly random in nature, just as a drop or accident causing an overstress fracture, for example. [Read more…]

“What’s the MTBF of a Human?” That’s a bit of a strange question?
Guest post by Adam Bahret
I ask this question in my Reliability 101 course. Why ask such a weird question? I’ll tell you why. Because MTBF is the worst, most confusing, crappy metric used in the reliability discipline. Ok maybe that is a smidge harsh, it does have good intentions. But the amount of damage that has been done by the misunderstanding it has caused is horrendous.
MTBF stands for “Mean Time Between Failure.” It is the inverse of failure rate. An MTBF of 100,000 hrs/failure is a failure rate of 1/100,000 fails/hr = .00001 fails/hr. Those are numbers, what does that look like in operation? [Read more…]

The MTBF calculation is widely used to evaluate the reliability of parts and equipment, in the industry is usually defined as one of the key performance indicators. This short article is intended to demonstrate in practice how we can fool ourselves by evaluating this indicator in isolation. [Read more…]

A Series of Unfortunate MTBF AssumptionsThe calculation of MTBF results in a larger number if we make a series of MTBF assumptions. We just need more time in the operating hours and fewer failures in the count of failures.
While we really want to understand the reliability performance of field units, we often make a series of small assumptions that impact the accuracy of MTBF estimates.
Here are just a few of these MTBF assumptions that I’ve seen and in some cases nearly all of them with one team. Reliability data has useful information is we gather and treat it well. [Read more…]

Math, Statistics, and EngineeringIn college, Mechanics was a required class from the civil engineering department. This included differential equation.
Luckily for me, I also enjoyed a required course called analytical mechanics for my physics degree. This included using Lagrange and Hamiltonian equations to derived a wide range of formulas to solve mechanisms problems.
In the civil engineering course, the professor did the derivation as the course lectures, then expected us to use the right formula to solve a problem. He even gave us a ‘cheat sheet’ with an assortment of derived equations. We just had to identify which equation to use for a particular problem and ‘plug-and-chug’ or just work out the math. It was boring. [Read more…]

Are the Measures Failure Rate and Probability of Failure Different?Failure rate and probability are similar. They are slightly different, too.
One of the problems with reliability engineering is so many terms and concepts are not commonly understood.
Reliability, for example, is commonly defined as dependable, trustworthy, as in you can count on him to bring the bagels. Whereas, reliability engineers define reliability as the probability of successful operation/function within in a specific environment over a defined duration.
The same for failure rate and probability of failure. We often have specific data-driven or business-related goals behind the terms. Others do not.
If we do not state over which time period either term applies, that is left to the imagination of the listener. Which is rarely good.
There at least two failure rates that we may encounter: the instantaneous failure rate and the average failure rate. The trouble starts when you ask for and are asked about an item’s failure rate. Which failure rate are you both talking about?
The instantaneous failure rate is also known as the hazard rate h(t)
$latex \displaystyle&s=3 h\left( t \right)=\frac{f\left( t \right)}{R\left( t \right)}$
Where f(t) is the probability density function and R(t) is the relaibilit function with is one minus the cumulative distribution function. The hazard rate, failure rate, or instantaneous failure rate is the failures per unit time when the time interval is very small at some point in time, t. Thus, if a unit is operating for a year, this calculation would provide the chance of failure in the next instant of time.
This is not useful for the calculation of the number of failures over that year, only the chance of a failure in the next moment.
The probability density function provides the fraction failure over an interval of time. As with a count of failures per month, a histogram of the count of failure per month would roughly describe a PDF, or f(t). The curve described for each point in time traces the value of the individual points in time instantaneous failure rate.
Sometimes, we are interested in the average failure rate, AFR. Where the AFR over a time interval, t1 to t2, is found by integrating the instantaneous failure rate over the interval and divide by t2 – t1. When we set t1 to 0, we have
$latex \displaystyle&s=3 AFR\left( T \right)=\frac{H\left( T \right)}{T}=\frac{-\ln R\left( T \right)}{T}$
Where H(T) is the integral of the hazard rate, h(t) from time zero to time T,
T is the time of interest which define a time period from zero to T,
And, R(T) is the reliability function or probability of successful operation from time zero to T.
A very common understanding of the rate of failure is the calculation of the count of failures over some time period divided by the number of hours of operation. This results in the fraction expected to fail on average per hour. I’m not sure which definition of failure rate above this fits, and yet find this is how most think of failure rate.
If we have 1,000 resistors that each operate for 1,000 hours, and then a failure occurs, we have 1 / (1,000 x 1,000 ) = 0.000001 failures per hour.
Let’s save the discussion about the many ways to report failure rates, AFR (two methods, at least), FIT, PPM/K, etc.
I thought the definition of failure rate would be straightforward until I went looking for a definition. It is with trepidation that I start this section on the probability of failure definition.
To my surprise it is actually rather simple, the common definition both in common use and mathematically are the same. There are two equivalent ways to phrase the definition:
We can talk about individual items or all of them concerning the probability of failure. If we have a 1 in 100 chance of failure over a year, then that means we have about a 1% chance that the unit we’re using will fail before the end of the year. Or it means if we have 100 units placed into operation, we would expect one of them to fail by the end of the year.
The probability of failure for a segment of time is defined by the cumulative distribution function or CDF.
This depends on the situation. Are you talking about the chance to failure in the next instant or the chance of failing over a time interval? Use failure rate for the former, and probability of failure for the latter.
In either case, be clear with your audience which definition (and assumptions) you are using. If you know of other failure rate or probability of failure definition, or if you know of a great way to keep all these definitions clearly sorted, please leave a comment below.

I find the world of maintenance a very odd place to find MTBF. While it is possible, that a set of equipment or a machine may actually have a constant failure rate it is the exception rather than all that common. Assuming a constant failure rate doesn’t make it so. [Read more…]
Just a short note today about a great high level article in Wired magazine. Robert Capps did a nice summary and review of the significance of reliability engineering, product failure and what we can do about it.
And he doesn’t mention MTBF – which is appropriate.
In light of the International Day of Failure, Oct 13th, let’s consider failure from a reliability engineer’s point of view. We work to understand and avoid product failures. When a product fails to deliver the desired performance attribute, it is tossed away, returned, replaced, repaired, or tolerated. This may occur before or after the product’s value has been achieved. [Read more…]
Recently I received a question related to setting an Acceptable Quality Level (AQL) for a sampling of fielded electricity meters. The question was on how to select the right AQL for use with the sampling plan. I was not sure from the question if the sample would determine if the population would be replaced or not (expensive), or simply an experiement to determine how the meters are doing after 15 years of service (information only). [Read more…]
Ask a question or send along a comment.
Please login to view and use the contact form.