Complications When Tracking Field Failures
Fielded products fail day by day. Customers report these failures generally seeking a way to remedy this issue. Gathering the reported or returned products or confirmed failures is common practice.
Depending on the product a simple replacement or exchange may suffice. For other products, repair or a refund may be appropriate.
In general, and not always, when a product fails in the hands of a customer, the organization designing, manufacturing and distributing the product learns of the failure.
A common practice is to count the number of returns per week or month. Counting as the items arrive. This tally per month is then easy to plot using a simple bar chart showing the count of returns per month over time.
The issue is that the number of units shipped change month to month the number of items that could possibly failure changes. The number of field failures could double even when the actual failure rate for products has not changed when we ship twice as many units.
A Very Simple Example
Let’s look at a very simple example.
If a new product is a 10% failure rate in the first month and no failures after the first month, and we ship 100 units. The first month we would receive 10 failed units back. If this occurs for the first three months of the year, we ship 100 units per month and we would receive 10 units back each month.
Now let’s say in April another customer orders an additional 100 units, thus we ship 200 items. Given the same failure rate, we would receive 20 units back. That effectively doubles the number of returns month over month. A 100% increase in field returns per month.
In this very simple example, it is obvious the number of units shipped doubled and the tracking failure rate would be an appropriate measure as we are interested in noticing a change in failure rate. Being able to identify such a change permits identification and resolution of the contributing factors causing the increase in failure rate. Or, the continuation of the causes of a lower failure rate.
Two things complicate this approach. Both the number of units produced and shipped vary, and the chance of a specific unit failing changes over time.
First, we often change the actual number of shipments per unit time.
While the forecast for shipments or sales may include nice round numbers per month, in reality, it is often quite variable. If the average shipments per month is planned to be 5,000 units the long term average may work out to be 5k units per month, yet the actual month shipments may vary.
The first month may be only 100 units, as production started just days before the end of the month. The next month, as the production capability ramped up the production line, they only could produce thus ship only 2,523 units. The third month in order to meet early demand the team works overtime and creates 6,467 units. And so on.
Variation in product capability, availability of necessary components and materials, holidays (production shut down), changes in customer demand, and many other elements change how many units are actually produced and shipped per month.
Failure Rates Vary
Even simple products have dozens if not hundreds or thousands of way it can fail.
Each failure mechanism has a finite probability of occurring any specific day. It’s a race to see which failure mechanism succeeds in causing a failure.
For a specific product that experienced an error during assembly, say a missing component for a specific function, let’s say it somehow shipped to a customer. It may fail immediately on first use, or it may lie dormant for months before that specific function is called into action and then exhibits the failure. Or, the missing part could lead to slow degradation of a function over many years only resulting in a reported failure many years after first use.
The same basic variability applies for each specific failure mechanism. A wear out mechanism may occur early with aggressive overuse, or only after an exceptionally long period of light infrequent use. Corrosion related failure mechanisms may occur quickly or not at all given the local humidity conditions.
In general, there is some pattern to specific failure mechanisms, yet they do exhibit variability of when failures occur.
Still a Simple Example
Let’s complicate the simple example described above. Instead of a fixed first-month failure rate of 10% let’s say it has the following number of returns given 100 units initially shipped.
At the end of three months, the total failure rate is 10%, yet the first month is was only 1%, then jumped to 6% the second month.
Now let’s imagine this organization ships 100 units in February and then again in March and each month’s production follows the same failure pattern. What would that look like over the first three months of production tracking cumulative shipments and returns per month?
Month Returns Shipments
Jan 1 100
Feb 7 200
Mar 17 300
Plotting the number of failures per month alone in not informative. Plotting the failure rate per month accounts for the number of units shipped, yet again is not very informative. The three months cumulative failure rate is 1%, 3.5%, and 5.6%.
The problem is customers after three months have a 10% chance of product failure, not 5.6%. Tracking cumulative failure rates using the cumulative number of returns and shipments under reports the failure rate in this case for customers that have the initial month’s units, as those units are now three months old. It may take many more months to recognize the underlying pattern of failures based on the age of the individual units.
Tracking and reporting based on the age of the unit is a better approach. Time to failure analysis of the data allows us to consider the probability of failure over time, just as the customer experiences the product.
The next article will describe a convenient way to track shipments and returns which allows the preservation of the time to failure information. How do you gather and report your field data?
Field Data Analysis First Look (article)