Reliability testing and data collection is a messy business.
We rarely receive perfect data where all units involved have a precise time to failure record. Sometimes we do not the precise time to failure information or some of the units are still operating.
The data is censored.
There are a few common types of censoring, each of which has statistical techniques to appropriately account for the unknown elements.
This is the most common type of censoring.
Right censoring occurs when we have some number of units operating in the field or under test and at some point, we gather the times to failure for the failed units and times of operation of those that haven’t failed yet. The time of operation is the censoring time as the unit is expected to fail at some point later.
Let’s say in January we ship 25 washing machines. 3 and 5 months later units fail, providing two times to failure data points. In June we still have 25 – 2 = 23 machines still operating. 23 units are right censored meaning they are still operating after 6 months (the censoring time). If in February we also shipped 25 machines and none have failed by June, we have an additional 25 unit right censored at 5 months.
This is the least common type of censoring, in my experience.
Left censoring occurs we do not know when the units started operation. We have the complete time of failure and do not have when each unit was placed into operation or test.
Let’s say we arrive at a factory and they say all 10 xyz pumps have failed over the past two months. In order to plot the data or estimate the failure distribution (or most anything) we need to know how long each pump operated, or in this case when each pump was placed in service.
When we don’t know the start time or duration, that is left censoring.
This happens if you determine if failures occur on a schedule, say every week or month. If you find a failure it may have occurred just after the last inspection or just before this inspection, and we do not know when the actual failure occurred. The failure occurred inside some interval.
This type of censoring was my first experience with reliability testing. Single censoring is as the name suggests a single point for all censoring. Similar to right censoring yet really only applies to specific situation.
For example, if we place 100 units in a test, monitor for failures and after 1,000 hours remove all units that haven’t failed. 1,000 hours is the single censoring time.
When there is more than one point when censoring occurs that is multiple censoring.
For example, if we place 100 units in a test and after 100 hours remove 10 units, then at 200 hours remove 10 units, and so on, each removal has a different censoring time.
Time Censored (Type I)
Here we end the test at a predetermined amount of time. The last failure is not equal to the censoring time.
Failure Censored (Type II)
Here we end the test at a predetermined number of failures. The last failure is equal to the censoring time.
Cumulative Distribution Function (CDF) plotting
A common task is to visualize the time to failure data even with censoring. If we plot only the times to failure we overestimate the failure rate at any point in time. We should account for the information in the censored items, yet they have not failed so do not have a time to failure point to plot.
Recall that a probability plot has time (cycles, etc.) on the x-axis and y-axis is the cumulative percent or percentile of failures. Depending on the distribution of interest the scales may be adjusted such that if the data presents a straight line the distribution parameters may be determined and the distribution is said to fit the data. The Weibull distribution uses a log – log arrangement.
One way to do this is to estimate the CDF or the cumulative population percent failure). Intuitively we could use 100 ( i / n ) with i failures out of n units under test. Thus if we have 10 units in the test, the first failure (i=1) time would be plotted at the 10% point on the vertical axis. This method is generally an over estimate or biased.
The approximate median rank estimate is generally accepted as addressing the bias adequately and relatively simple to use. For each time ti, of the i-th failure, calculate the CDF or percentile using 100 ( i – 0.3 ) / n + 0.4 ).
If we have 10 units that have failed out of 10 units or complete data that first point plotted would be at 6.73% and the time of the first failure. And, the 10th point would be at 93.3% and the time of the last failure.
If the 10 failures were from a group of 100 units, 90 of which are right censored, the first point would be at 0.697% and the time of the first failure. And, the 10th point would be at 9.66% and the time of the last failure.
Reading CDF plots (article)
Confidence Intervals for MTBF (article)
Confidence Limits (article)