# Gathering Field Failure Data

A common and poor technique to gather field data is to count the number of returns by week or month. This can provide a graph showing the number of returns over time.

It hides information you need to understand your field failures.

Let’s take a look at a way to gather the same field failure data and retain the critical information necessary for time to failure analysis.

## The Necessary and Available Data

Most organizations keep track of shipments.

Possibly counting the number of products shipped to customers on a monthly basis. Let’s say we’re manufacturing bicycles and roughly ship 5,000 bikes per month. We need this information in order to compare to the number of returns in order to estimate failure rates, and more importantly, the number of bikes that have not failed.

With a short conversation with the person tracking shipments, we find over the past 6 months the following shipments.

Month Shipments

Jan 3,519

Feb 6,292

Mar 7,132

Apr 5,633

May 4,222

Jun 4,476

Each bicycle has a serial number which includes the month of production, thus we know which month the unit shipped. This is the next piece of data we need concerning a returned unit, which month was it manufactured.

When a unit is returned we count it as a return for the specific month of production. This allows us to know how many bikes from that month of production have not failed.

We also then know how long this particular returned unit was with a customer. Of course, we’re making some assumptions about transpiration, time on store shelves, and other variables, yet often we only really know the shipment month and the returned month, thus the time (roughly) with a customer.

## Organizing the Returns Data

One way to organize the data as it becomes available is in a Nevada chart.

The name is because the resulting table has a triangle shape reminiscent of the lower part of Nevada shape if tipped on its side. Doesn’t work for me either, yet that is the chart’s name.

Following the example started above, let’s count the number of failures per month and log the count by month of shipment.

Month Ship Jan Feb Mar Apr May Jun July

Jan 3,519 3 6 3 7 10 3

Feb 6,292 4 8 20 35 24

Mar 7,132 8 14 25 31

Apr 5,633 4 13 6

May 4,222 6 8

Jun 4,476 6

Thus in January, we shipped 3,519 units and three from that group returned in January, and another 6 in February. We also received 4 returns in February from the batch of February shipments.

Notice the shape of the table of return counts—looks like Nevada right?

## Preparation for Analysis

This chart is only for gathering the data. It is difficult to make any conclusion based on this table of data. What we do need to know is how long the units returned were in the field and how many remain right censored (haven’t failed yet).

It is difficult to make any conclusion based on this table of data. What we do need to know is how long the units returned were in the field and how many remain right censored (haven’t failed yet).

The time to failure is the difference between the return month and the month of shipment. Thus for the returns in February, the units shipped in January we can say were in the field for 2 months. For the those shipped in February, they were in the field for 1 month.

I’m assuming all shipments are at the start of the month and all returns are at the end of the month. This helps to avoid having a duration in the field of zero which can cause trouble with some analysis later.

Having day or week of shipment and corresponding day or week of return would be an improvement, yet it seems monthly shipments and returns is fairly common. Monthly data is still useful.

The other element we need is how many units remain and are thus right censored. We can calculate this for each row by subtracting the total number of returns from the number of units shipped. The time of censoring is the difference in the current month and the month of shipment. Repeat this calculation for each listed month and shipment row where a return occurred.

Thus for the January shipment row, we have 32 returns and thus 3,487 units that have not failed. Repeat these calculations for each row.

Now you have the number of returns and the time to failure for those returns, plus the number and duration of units censored. That is the necessary information for a time to failure analysis.

Make sense?

## Related:

Failure Analysis: The Key to Learning from Failure (article)

When to Take Action on Field Failure Data (article)

Confidence Interval Interpretations & Misunderstandings (article)

SPYROS VRACHNAS says

Hello Fred,

I enjoy reading your articles regularly.

Can you take this analysis one step further? What is the “field” MTTF for this data?

Fred Schenkelberg says

Hi Spyros,

I’ll do that in a future article, yet it’s left as an exercise at the moment. And, why would you want to know the MTTF? It’s a rather useful value imho. Instead how about learning the probability of surviving a specific duration? Isn’t that directly useful…

Cheers,

Fred

Harry White says

The usefulness of Nevada charts (also called Triangle charts) is seen when calculations are added and trend charts are made of metrics such as First Year Failure (FYF), Long Term Failure (LTF; >1 yr since shipment), Average Failure Rate, Cumulative Percent Failed, etc. By monitoring these, and tracking for different vintages (build year, Rev level, change in CM, etc) then Nevada charts are a useful basis for a field reliability program.

Fred Schenkelberg says

Thanks Harry, much appreciated the additional information. cheers, Fred

Benjamin says

Hi Fred,

It seems to me this chart works mainly for non-repairable products, and therefore you need to calculate the time to failure together with other data. In case of repairable products, it is more straightforward. While we replace a failed item, assuming random constant failure rate and “short” MTTR (e.g. applicable for electronics equipment) we can just focus only on the accumulated operation time (or unit-hours) regardless when failures had occurred and the number of total failures. In many cases we do not have vital information about the failed item for example its serial number, if it was Dead on Arival, therefore cannot even if we want to measure it’s time to failure.

In addition, we can assume linear shipment of items during a time interval (e.g. a month in your example) if shipments are in couple batches to achieve some more accuracy in our analysis. Of course the area under this linear shipment plot is equal to the accumulated operation time at that time interval.

Thx,

Fred Schenkelberg says

Hi Benjamin,

Please do not assume constant failure rate, it rarely is true. By making that assumption you strip the need to gather the data necessary to check that assumption.

For repairable products you may be interested in the ongoing impact of maintenance activity – are they bringing the system to as good as new, or bad or old. Is the system availability getting better or worse over time. Check out the mean cumulative function to visualize and model this data.

http://nomtbf.com/2012/02/graphical-analysis-of-repair-data/

Cheers,

Fred

Benjamin says

Hi Fred,

I agree this assumption is not always correct, but my comments were related to electronics items only (which I deal with) on which for most cases it’s accurate to assume constant failure rate during useful time.

Thx

Fred Schenkelberg says

Here I disagree and have seen upon analyzing the data only very rare situations where the assumption is true. check the assumptions.

Cheers,

Fred