A colleague and friend, Bill Barto, worked up an example concerning two pieces of equipment. By simply not assuming a constant failure rate the …. well, let Bill tell the story….
Here’s the story in Bill’s own words via Youtube.
[just in case you not able to view the video – here’s the text.]
Imagine you show up to a client site where there are two similar pieces of equipment that they are having trouble with. They want your help in determining where to start since they only have the resources to address one of the machines. They indicate that they have been tracking MTBF for the last two months on the equipment. They give you the following three pieces of information:
- Both pieces of equipment have been run for the exact same amount of time over the last two months and produce the same product at the same rate.
- Machine #1 has a MTBF of 25 hours for the first month and 46 hours for the second month.
- Machine #2 has a MTBF of 30 hours for the first month and 50 hours for the second month.
Which machine do you choose to work on?
From this information, it seems like Machine #1 is running worse than Machine #2 since Machine #1 has a lower MTBF each month. With this data, you may choose to focus on Machine #1 first since you would assume there have been more failures there given that they were run for the same amount of time over the last two months.
Here’s the kicker. When you ask them for more detail on where they got these numbers you are given the actual run times and number of failures.
[table width=”600″ colwidth=”10|170|10|170|10|170|10|” colalign=”center|center|center|center|center|center|center”]
Machine,Month 1,MTBF,Month 2,MTBF,Both Months Combined,MTBF,
#1, Run time = 150,,Run time = 690,,Run time = 840,,
,# of failures = 6,25,# of failures = 15,46,# of failures = 21,40
#2, Run time = 540,,Run time = 300,,Run time = 840,,
,# of failures = 18,30,# of failures = 6,50,# of failures = 24,35
Now when you compare the MTBF and number of failures for the combined time in the last set of columns, Machine #1 looks to be running better than Machine #2.
The method of calculating MTBF used here (run time / # of failures) is based on the assumption that the data is from an asset experiencing random failures. Recent studies have shown that failure rates that were assumed to be random (e.g., lightbulb failures) are not random at all. Because of the ease of calculation this technique is misused often. A better method might be to more closely look at the data (individual failure times) and determine the actual failure distribution (Weibull, Normal, Lognormal, etc.) through other methods. There are a variety of software tools for determining the best fit distribution so that metrics such as MTBF and others can be determined more accurately.
Bill Barto, CMRP, ASQ CRE
Life Cycle Engineering, Inc.
Thanks Bill for the story and case. It’s clear in nearly any situation where the MTBF values change from month to month that further analysis is well worth the effort. And, as I would say, it’s not much more effort and well worth letting your data speak clearly.
Barry Snider says
The entire example is bogus. One must first accept the premise that MTBF is a valid indicator of predicting failure and thus scheduling preventive measures. It is not for the vast majority of failures. First off, no where are the failure modes of the two pieces of equipment mentioned. Comparing failures without noting the failure modes makes the entire exercise irrelevant. For example, what if the equipment items are centrifugal pumps. Failure no. 1 is a seal leak, failure no. 2 is bearing failure, failure no. 3 is another seal failure, failure no. 4 is a loss of discharge pressure, failure no. 5 is a coupling failure, and so forth. Who cares how many failures occur if they are from a variety of failure modes meaning there are a variety of unrelated prevention measures. If you use MTBF, you lose.
Bill Barto says
Barry – Thanks for taking the time to watch and post your comments. After reading the points you make it your post carefully, I believe that we are in agreement. If you are only provided MTBF metrics with no background information on failure modes (as you point out) or underlying failure distributions, many incorrect conclusions can be made. I agree with you and Fred that focusing on the failure mechanisms is the best method. My example was intended to be poorly managed (“bogus”?) to show the pitfalls of blindly using MTBF. I hope I interpreted your comment correctly.
Fred Schenkelberg says
I agree that not using MTBF altogether is best and focusing on the failure mechanisms is even better. This is just a one step in that direction. It is one way to move forward and help someone ask better questions.
Mark Powell says
Bill seems to be focusing on “the method to calculate MTBF.” The problem is that even if he figures out the best way in the world to calculate MTBF, as Barry says above, the problem is that MTBF is the wrong thing to use to make any decisions.
While a fun little expose’, this one kind of misses the real salient point.
Bill Barto says
Thanks Mark, I see your point. Honestly, when I created this I really felt that I had done a better job of dismissing the use of MTBF entirely. On review, I recognize that is not entirely the case. I’m really a fan of (and believer in) the noMTBF movement and hope to be able to do more to contribute in the future.