Let’s say we have a product that most often fails for one major component. Let’s say a fan (it could be anything, and while I don’t have anything against fans, it’s easy to picture).

Ok, this fan has a data sheet with the classic reliability claim of 50,000 hours MTBF. For those that know about my disdain for MTBF (www.nomtbf.com) rest assured I’m not going to get into it here. The basic approach for estimating the number of failure during any period of time does require a few pieces of information. MTBF is common on data sheets, so, in this case, that’s where we start.

Without any other information about the life distribution and given only MTBF, we will have to use the exponential distribution. The cumulative distribution function is

$$ \large\displaystyle F\left( t \right)=1-{{e}^{-{}^{t}\!\!\diagup\!\!{}_{\theta }\;}}$$

where, F(t) is the probability of failure up till time, t. Theta, θ, is the MTBF.

The next piece of information we need is the warranty period or the period of time of interest. In this case, let’s say it’s three years. And, since the fan is the primary concern in this simple example, we can consider the duty cycle of the fan within the product. The sake of ease in this example, let’s say the fan in working full time (maybe a server product, for example). That means the fan will operate for 365 days x 24 hours x 3 years = 26,280 hours.

Now we’re ready to do the calculation.

t = 26, 280 hours

θ = 50,000 hours

Using the equation above, we find 0.41, or we would expect that about 41% of the fans would fail by three years. The time is related to the age of the individual units, not production time. In short, a lot would fail. How many?

We need how many units are shipped or expected to ship. Let’s say, we are assuming we will produce 10,250 of these products, how many will come back under warranty due to fan failure?

10,250 x 0.41 = 4202.5 or just over 4,000 fan failures.

Multiply the number of warranty failures by the cost of a warranty return to find a number of warranty reserves to set aside.

If you have any questions or would like to see other examples, please leave a comment.

Related:

Confidence Intervals for MTBF (article)

Using The Exponential Distribution Reliability Function (article)

Reliability Goal (article)

*
Also published on Medium. *

Hilaire Perera says

MTBF/MTTF as single point estimates are “risky”. Better to use Lower Confidence Limit of these numbers when calculating Reliability, Allocating Spares

Michael Li says

Hi,

By following formula, exp(-t/MTBF)=0.59, then 1 minus 0.59 equals 0.41. 0.41 would be the probability of failure. Is that true?

Regards,

Michael

Fred Schenkelberg says

Yes, Michael, that is true, the reliability function is as you describe it, and 1 – R(t) is the CDF which provides the probability of failure over the duration t. I forgot to subtract the reliability function (probability of success) from one. It’s updated now.

Michael Li says

This is a good article for helping me solving the relationship between MTBF and warranty.

KESAVA says

How would I calculate warranty cost for repairable products, if I have MTBF , missing time

Thanks,

Kesava.

Fred Schenkelberg says

Hi Kesava,

Pretty much the same way as in the article. If you have one piece of equipment, then skip the last part about how many units are running.

The hard part is with only MTBF you can only estimate the expected number of failures or the probability of failure over some duration. You need to be sure the MTBF value is valid over the time period of interest. IF the value is based on the first year of operation, it may not be accurate for the second year, and very inaccurate for the 10th year.

Another way to think of the problem is that MTBF is just the inverse of the failure rate. Given the failure rate per hour and how many hours you expect to run, calculate the number of expected failures.

You also need the cost of repair – or replacement.

If you really want to estimate the warranty of a repairable system – you really should understand the failure distributions for the repairable items and the overall system (reliability block diagram comes to mind here) and then estimate the costs based on which element of the system failures. A bit more complicated yet a whole lot more accurate.

Cheers,

Fred

Asfour says

Hi Fred,

at first thanks for the effort and followup, to answers others queries. My issue is related to devices warranty calculations, those devices vary from DDC controllers to various types of sensors, active and passive. one of the painful argument is how much spars cost should be considered during the warranty phase. which vary from 1-3 years.

MTBF for devices are known, but when i try to use available formulas and i tried a lot, the result is not logic. since actually this is not happening, and i mean by failure is device need to be changed/replaced not to be maintained. so can you help here?

Fred Schenkelberg says

Hi Asfour,

With actual field data, shipments and returns, better if you know the date of shipment or installation, and date of the return for specific serial numbers, you can sort out the time to failure distribution. I often start with Weibull and see how well that works. With that data, you have a representation of the actual rate of field failures and can estimate future failures as well.

Using MTBF or MTTF of components or any parts count type estimate of reliability rarely, and only by luck, going to represent the actual field reliability performance. Using field data and calculating MTTF or MTBF likewise will provide a crude estimate that does not include the changing nature of the failure rate as the item ages.

So, do not use MTBF. Use the field data you have.

Cheers,

Fred

Tom Nolan says

Looking to see which is the best way to calculate parts replaced and returned from the field. Currently using Predicted Annual Failure Rate (PAFR) is there any other method to do the calculation. I have had a request to do calculations on return rate do you know if it possible to do.

Failure Rate (PAFR) = the expected qty of returned parts to the OEM that are actually defective. This excludes NTFs. Again, expressed as a percentage of the component IB, annualised

Return Rate = the expected qty of parts returned from the field from Veritas’ service partner to our OEM, expressed as a percentage of the component IB, annualised

Fred Schenkelberg says

Hi Tom,

First off keep in mind that the annualized failure rate is an average and thus not informative on any changes to the rate of returns.

Second, always count NTFs – a very easy way to help improve the return rate is to classify more as NTF. Besides if you have NTF there is still something to solve else customers would not be returning them to you.

Third, better is to use the field return data directly to fit Weibull or appropriate distribution to the data – then use that information to predict returns each month going forward. Weibull++ has a handy tool to analyze and predict.

Forth, before shipping, you can use the development reliability block diagram and current reliability estimates to estimate warranty returns. You’ll need an estimate of weekly or monthly shipments as well.

Cheers,

Fred

Vijay says

Hi Fred,

Thanks for the example.

You arrived at 4202.5 failures based on CDF*number of fans.

What if we approach this from an expected number of failures view?

For a component having constant failure rate,the expected number of failures follows a poisson process with a mean of n*λ*t

Therefore , expected number of failures over time (26,280 hrs) = 10,250*1/50,000*26,280 = 5387.4 which is vastly different from 4202.5.

Which one is the correct methodology.

Thanks

Fred Schenkelberg says

Hi Vijay,

I do not think either is appropriate nor very good (accurate) as very few if anything follows a constant failure rate. Better to understand the driving failure mechanism and model the time to failure behavior.

Cheers,

Fred

William Thorlay says

Hi Fred,

Considering a duty cicle of 12 h/day, should I use only this 12 h and calculate F(t) in 3 years? If I am a maintenance engineer, should I take the downtime hours to calculate F(t) or assume that the down time is not representative and just use the period of time that I want to know this particular F(t).

Fred Schenkelberg says

Hi William, both good questions. Yes, adjust the time element to reflect the duty cycle and be clear about what 3 years represents – i.e. not 24/7 operation. For the maintenance example, downtime is fine, yet you most likely will want to know more than just an average. As with any set of data, adjust the analysis to help you learn or understand what is happening – the analysis should lead to better questions as you explore ways to make improvements or changes. cheers, Fred

Mark fiedeldey says

Fred,

I bet this was difficult for you to force yourself to write. MTBF is such a substandard metric. But thanks for the example.

Happy Easter,

Mark

Fred Schenkelberg says

Hi Mark, thanks for the note – many of my short tutorials are for those preparing for the ASQ CRE – yet, you know how I feel about using MTBF in any situation. cheers, Fred