5 Ways Your Reliability Metrics are Fooling You

Last Verified June 9, 2025

We measure results. We measure profit, shipments, and reliability.

The measures or metrics help us determine if we’re meeting out goals if something bad or good is happening, if we need to alter our course.

We rely on metrics to guide our business decisions.

Sometimes, our metrics obscure, confuse or distort the very signals we’re trying to comprehend.

Here are five metric based mistakes I’ve seen in various organizations. Being aware of the limitations or faults with these examples may help you improve the metrics you use on a day to day basis. I don’t always have a better option for your particular situation, yet using a metric that helps you make poor decisions, generally isn’t acceptable.

If you know of a better way to employ similar measures, please add your thoughts to the comments section below.

Pareto Charts

A wonderful arrangement of counts of failures (or whatever set of categories your tracking). Using the concept that 80% of the problems come from 20% of the causes, we use this arrangement of information to prioritize our focus.

Solve the big hitters, those that occur frequently.

Is this always the right approach?

I say no.

Sure the concept is sound, yet the practice often only counts and displays the reported symptoms, not the root cause. The Pareto principle works on causes, not on symptoms. Keep in mind that a failure to power on may have dozens of underly possible causes. Which are you going to solve? Strive to track causes not symptoms, which is good advice in general. Applied to Pareto charts takes more work, yet provides clearer direction for your improvement efforts.

Another issue with a Pareto chart is it doesn’t not include severity of the failure. A few cell phone battery fires may get your product pulled from stores and airplanes. Yet, 200,000 scratched cases on an otherwise functioning phone may divert resources to solve a problem that most likely doesn’t need immediate attention. Adding weighty to indicate severity or cost of failure may help the chart convey what is most urgent and important, not just what is most common.

12 Month Rolling Average Warranty Returns

Whether count of returns or cost of returns or both, the rolling average tends to smooth out the trends. I doubling of returns in the most recent month may go unnoticed when averaged with the previous 11 months of data.

If only interested in trends over the long term, a rolling average smooths the curve out, yet often obscures the cyclic and noisy bits you may need to know concerning warranty.

The other issue I have with this approach is the denominator is changing. As your product ramps in production, the average failure rate will appear to decrease when in fact it is increasing. Let’s say after three months some proportion of products fail, after very few failures the first two months of use. Using a rolling average which including each month’s addition units exposed, we may swamp the oldest’s units signal that they are getting older and starting failure more often.

As the initial produced unit start to fail at an increasing rate, the latest months production, often larger than the initial months, may result in the false signal that everything is getting better.

Another issue is it may take a year or more for a step change in failure rates to alter the trend. Let’s say in June the actual failures rate for that month doubled. Since we have 11 months that experience a lower rate, the average will increase slightly. If the new higher failure rate remains at play, then each month the average rate will continue to increase and not reflect the magnitude of the actual failure rate for a year.

When possible and it’s always possible do not use rolling averages.

MTBF by Month

I was going to list this first or last in this list, given my dislike for MTBF in general. Yet, I suspected placing it third in this list may surprise you a bit. I guess this list in not in any particular order.

Regular readers of this blog understand the many issues with MTBF. Yet, we all have seen and continue to see MTBF tracked in a variety of ways. My favorite example of a very poor measure is to track MTBF values as a month to month trend.

Of course using only MTBF basically erases any information about the rate of change of the failure rate. Tracking field failures we have the information to determine if older units are failing at a higher rate, or we have a significant early life failure issue. The averaging done to create MTBF values wipes all that critical information away.

It is possible to implement a change to reduce early life failures and then experience a reducing in MTBF. The slope of shifts on a Weibull plot from below one to above, thus pivoting and reflecting information that appears to reduce MTBF. The product is actually better reliability-wise, yet our tracking metric suggested our improvements actually degraded reliability performance.

SPC C-chart of Returns by Day

I’ve only seen this once. At first, I thought this would be a clever use of control charts to monitor field returns. A closer look suggests it is obscuring some essential information.

The returns received are from a range of different aged products. The count of failures includes early life and wear out failures. The count increase and decrease, even on a day to day basis may reflect prior shipment counts more than anything else. The data is also muddled by vagaries of customers collecting failures units in order to ship them together.

Compare your call center data of the time from issuance of a return material authorization (RMA) and when the unit is actually received. The one time I did that the long-tailed skewed delay simply complicated our estimates of how long the product was actually used.

Instead use the call center data if you need counts. The issue of changes in daily or monthly shipments will still need attention to make this a useful tracking method.

Returns this Month over Shipments this Month

Unfortunately, I’ve seen this approach used a few times. This is simply counting how many units got returned in a month over how many were shipped that same month.

It might be possible for a unit shipped early in the month to failure and be returned that same month, yet unlikely in most circumstances. The ratio of these two unrelated counts does nothing to inform your team about patterns or changes in field failure rates.

An increase or decrease in a monthly shipment tally may have a larger impact on the months results. A spike in returns may fall during a particularly high shipment month and blunt the spike in returns message.

This ratio does not provide even a poor estimate of field failure rates. What we want is the ratio of returns over those units at risk to fail. While a bit more difficult to keep track off than monthly returns and shipment counts, it does reflect field failures a tad better.

Of course, tracking failure rates properly still runs into the average issues mentioned above. Instead, let’s track failure rates by age of unit. How many failures are occurring in the first month after shipment or installation? How many failures occur over the warranty period?

Better would be to track changes to the life distribution. I like starting with our projected Weibull distribution (or appropriate life distribution for your product) on a cumulative distribution function (CDF) plot. Then as returns occur and using the age of the units when the fail, along with censored data, build the field failure Weibull curve in comparison to the expected curve.

These are just a few of the problems with metrics I’ve seen. Plus a few suggestions to do better. What’s you take on this? What kind of poor metrics have you seen? What is your go to best way to track field failures? Add you comments below.

Comments

Rakesh jha says
February 4, 2017 at 7:56 PM
Very good article and this reinforces that there is no single kpi to cover all. There is always need to understand what are our reliability targets to achieve and then set up kpi. Further along with tracking no, little bit analysis is required to understand the insight of story and this also depends on reliability journey and maturity level of an organisation . Great work and thanks for sharing …!!
Bert Schaefer says
July 14, 2025 at 5:28 AM
To track changes to the life distribution should you use all the return data together or do you have to group it by each failure mechanism. We do not look at every return so we do not have the data on why it failed (or if it did) for all of the returns.
- Fred Schenkelberg says
  July 14, 2025 at 6:38 AM
  Hi Bert,
  The approach for the analysis depends on what you are trying to do with the data you have. If analyzing each return as a failure, the remaining in the field as suspensions – can provide some information on how the item is doing overall. It may lead to anomiles or changes that with further failure analysis lead to specific mechanims to address, yet it is not as powerful as knowing the failure mechanism for every return.
  Also, keep in mind that is a cusotmer takes the time to return a product, it is a failure – maybe not the hardware or software, yet something didn’t meet their expectaion, hence worht the effort to return it. No trouble found returns still cost you money, so treat them as failures, too.
  cheers,
  Fred