What can we do without reliability function estimates? FMEA? FTA? RCA? RCM? Argue about MTBFs and availability? Weibull? Keep a low profile? Run Admirals’ tests? Look for a new, well-funded project far from the deliverable stage?
Ask for field data; there should be enough to estimate reliability and make reliability-based decisions, even if some data are missing. Field data might even be population data!
Data Saga
I wanted to estimate the reliability and failure rate functions, for reasonable ages t, for all the hematology business unit’s products and parts. Textbooks say, estimate reliability functions from random samples of ages-at-failures {T(1), T(2),…,T(r)} and survivors’ lives. As usual, I didn’t have ages-at-failures.
I could estimate failure rate function from ships and parts’ failure counts required by GAAP using the methods in “How Can You Estimate Reliability Functions Without Life Data?”, https://accendoreliability.com/?s=tribus/, and https://sites.google.com/site/fieldreliability/.
In February I submitted a request to the MIS department for the hematology business unit’s products’ and parts’ installed base and failure counts. The MIS department “prioritized” my request.
In July Eric from MIS said he’d start working on my request. He asked if it would be OK to give numbers failed at each age, in months 1-24? That’s grouped age-at-failure data. I thought, “Why do work so hard? I could use the Kaplan-Meier nonparametric reliability estimate on ages-at-failures, at least up to age 24 months.” I thanked Eric, grateful for anything.
In October, Frank from MIS offered data by ”PRODCODE”, “SER#”, “PN”, “DESC”, “TRANSDATE”, “FLAGS”. In addition to failure counts at ages 1-24, Frank offered total failures at all ages greater 24 months grouped into the 25th month. “PRODCODE” and “TRANSDATE” indicated many products had been in service longer than 24 months, with some parts’ failures, usually for the first time. (Automotive aftermarket stores save parts’ sales data for two years, without parts’ ages-at-failures. They’re renewals or replacement parts https://accendoreliability.com/renewal-process-estimation-without-life-data/.)
Reliability Estimation from Grouped Life Data is Easy
The installed base and failure data for months 1-24 go into a “Nevada” table for grouped failure data, https://accendoreliability.com/nevada-charts-gather-data/. I used the Kaplan-Meier nonparametric reliability estimator for ages 1-24, and Greenwood’s formula for variances (covariances are approximately zero!). I could forecast replacement requirements, recommend parts’ stock levels, do diagnostics, and make credible reliability predictions for new products from similar, old parts’ reliability estimates, for all 2537 hematology business unit’s parts.
What should I do with the failures grouped into month 25, from products or parts older than 24 months? Who cares? Me! Why? That’s additional information! I wanted to detect premature wearout, which indicates possible design defect. (Failure rate function increases.) I also wanted to detect retirement so I don’t get stuck with obsolescent spares. (Failure rate function decreases.)
To forecast replacement requirements, I needed to estimate or extrapolate the failure rate function for ages greater than 24 months, because some products and their parts have ages greater than 24 months.
Failure Rate Function Extrapolations
When I have had no information about older failures, I have extrapolated failure rate function estimates by regression. But Frank told me how many failures occurred after age 24 months, just not when. Why not extrapolate by maximizing likelihood,
PRODUCT[(1-R(t))r(t)R(t)(n(t)-r(t)); t=1,2,…oldest],
where R(t) is the reliability function, r(t) is the number of failures of age t, and n(t) is the installed base of age t including ages beyond 24 months? That’s what the Kaplan-Meier estimator does, except that all I know is n(t), t=1,2,…,oldest and r(?) the sum of all failures at ages greater than 24 months.
How to model failures older than 24 months? Constant failure rate? Linear? Other? The choice should depend on how the failure rate function looks before age 24 months, the number of failures older than 24 months, and your experience. Wait, you say! Couldn’t the failure counts older than 24 months change the earlier reliability estimates, ages 1-24? Nope, maximizing log likelihood maximizes a sum by maximizing each summand. I checked reliability estimates; no difference. That’s enough proofiness for me.
Constant Failure Rate? For older parts, make expected deaths older than 24 months equal the observed and reported sum r(?) of failures at ages greater than 24 months, by choice of a constant (actuarial) failure rate “a(25)” estimate = failures per month/number exposed; i.e.,
a(25) = r(?)/SUM[(t–24)*[N(t)–a(25)*E[N(t)]); t=25, 26,…,oldest],
where N(t) is the ships in month t=25,26,…,oldest, and E[N(t)] is the average ships per month. Expected failures are
SUM[N(s)*p(s)]*PRODUCT[(1–SUM[p(t)])/R(24)],
where the sum and the product run from s and t = 25 to the age of the oldest product, N(s) is the number shipped s months ago, p(s) is the probability age at failure is s months, and R(24) = P[life > 24] = 1 – SUM[p(t); t = 0, 1,…,24]. Set Expected failures equal to observed with a(t) = a(25) for all ages t > 24, where a(t) = p(t)/R(t), the conditional probability of failure in the next month given survival to age t.Table 1 Example: Constant failure rate for parts in a product 32 months old: The E[deaths] column is the actuarial failure rate a(25) times the numbers of survivors, and the survivors column is Ships N(t) minus E[failures] r(t). The last column a(25) is r(?) divided by the sum of the t*sum[N(t)] column.
Other Failure Rate Models: Maximum likelihood chooses fractional ships after 24 months of age, constrained to equal the reported failure count after 24 months r(?), to make nonparametric estimates of the reliability and failure rate functions for ages up to the oldest unit in the installed base.
I used Excel Solver to maximize likelihood; Excel blew up for the “Unconstrained” alternative, so I manually entered 1 failure in month 30 or “Limited” the failure rate to prevent #NUM! error. The maximum likelihood (lnL in Table 2) was achieved by the “Unconstrained” alternative with one failure in month 30. The failure rates indicate there was wearout, because the “Limited” and “Linear” alternatives also showed increasing failure rates.
Table 2 Example. Data are from some US postal service machines. There was 1 failure in months 25-30. Alternative failure rate models are: unconstrained, constant, limited, and linear. The alternatives postulate fractional failures at ages 25-30, and Solver maximizes log-likelihood (lnL) for reliability and failure rate function estimates. The constrained maximum likelihood failure rate estimates are in the last four columns.
Free offer
These examples are not the only problem I’ve seen with grouped data. A sterile glove company’s [Terumo] customers batch failures and send them back whenever they feel like it. Imagine grouped failure counts with reporting delays so that the most recent counts are obviously under-reported [ReliaSoft]. Imagine sell-through time, the time from reported sale until first use [hematology business unit].
If you have a problem with grouped failure counts, send pstlarry@yahoo.com your installed base by age and grouped ages at replacements, and I’ll send back the Kaplan-Meier estimate of reliability function, Greenwood’s estimator of its variance, estimate of the failure rate function, and alternative maximum likelihood estimators for the older, grouped data.
Leave a Reply