AFRs are periodic ratios of failure counts divided by installed base. Have you seen meeting rooms wallpapered with AFR charts (Annualized Failure Rate)? Have you sat through debates about the wiggles in AFR charts? Fred Schenkelberg wondered if reliability could be estimated from AFRs and their input data? How about age-specific reliability and actuarial failure rate functions? Actuarial forecasts? MTBFs? Wonder no more!
AFR [Annualized failure rate – Wikipedia] is a ratio of failures divided by time or installed base, computed periodically: AFR = (failures/Operating time)*(Annualization factor). Julio Calderon found that HDD AFR and vendor 8766/MTBFs didn’t agree! (AFR=8766/MTBF; there are 8766 hours in an average year.)
AFRs and MTBFs are not reliability! Reliability, R(t), is P[Life > t] for t ≥0, and a(t) is the actuarial failure rate, [R(t‑1)‑R(t)]/R(t-1). Don’t extrapolate AFRs to make failure forecasts! That’s like driving while looking backwards. Why not use AFR or FRACAS [MIL-STD-2155] input data to estimate age-specific reliability? Why not make actuarial failure forecasts? An actuarial forecast is ∑a(s)n(t-s), s=0,1,2,…,t, where n(t‑s) is the installed base of age t-s. Actuarial forecasts account for the forces of mortality that cause failures. Look ahead!
How to Extract Installed Base Given AFRs and Failure Counts?
Ships (cohort sizes or installed base by age t), n(t), and failures or returns counts, r(t), are statistically sufficient to make nonparametric estimates of reliability and actuarial failure rate functions [George 1993]. What if you have periodic AFRs and failure counts r(t) but not installed base, n(t)? (Failures or returns counts r(t) in period t could be returns shipped in any previous or current period.) Given failures or returns counts r(t) and successive AFRs, find ships or installed base cohort sizes n(t), t=0,1,2,…. The solution for a two-period solution is
n(1)=r1/AFR(1) and n(2)=(AFR(2)*r(1)-AFR(1)*r(2))/(AFR(1)*AFR(2)),
and the general solution is
n(t)=(AFR(t‑1)*r(t)-AFR(t)*r(t-1))/(AFR(t-1)*AFR(t)).
Table 1 is an example. AFR in column 5 is the ratio of Fails/Cum Ships. Cohorts “n(t)” values in column 6 are computed from the last formula. Cohorts in column 6 match simulated ships. That solution doesn’t work when either AFR() in denominator is zero or when the numerator is negative.
Table 1. Ships are simulated Poisson(1000) monthly; “Fails” are fake. AFR = Fails/Cum Ships. Reliability is the maximum likelihood estimator.
Months | Ships | Cum Ships | Fails | AFR | n(t) | Reliability |
1 | 1016 | 1016 | 1 | 0.000984 | 1016 | 0.9990 |
2 | 1007 | 2023 | 3 | 0.001483 | 1007 | 0.9970 |
3 | 1012 | 3035 | 5 | 0.001647 | 1012 | 0.9951 |
4 | 968 | 4003 | 7 | 0.001749 | 968 | 0.9928 |
5 | 1029 | 5032 | 11 | 0.002186 | 1029 | 0.9893 |
6 | 1004 | 6036 | 14 | 0.002319 | 1004 | 0.9861 |
Backblaze Data?
Backblaze publishes HDD (Hard-Disk Drive) and flash drive AFRs and quarterly input data: {Mfg, Model, Size, failure count, Days, Failures, AFR}. Backblaze went public in 2022 (BLZE); their www.backblaze.com site sells cloud storage. Andy Klein’s quarterly reports are under their “About” menu. He reports on millions of HDDs and summarizes, in AFRs, information about individual HDDs. IDEMA published standards for tracking individual HDD lifetimes by “vintage” including use factors that may facilitate root cause analysis [Elerath].
AFRs are not as informative as age-specific field reliability and failure rate function estimates, especially as estimates evolve during product life cycles. Backblaze quarterly report data is sufficient to estimate age-specific field reliability. I picked the Western Digital HDDs because their data started from first shipments. Their failure counts do not include failures from previous, unknown ships cohorts. [It is possible to account for failures from previous cohorts; e.g. COVID-19 mutations, Field Reliability – Corona virus survival analysis (google.com).]
Table 2. WDC Western Digital – Wikipedia reports’ HDD AFRs. The Q4 AFRs come from annual reports.
2020Q4 | 2021Q1 | Q2 | Q3 | Q4 | 2022Q1 | Q2 | Q3 | Q4 | |
WDC | 0.16% | 0.57% | 0.46% | 0.39% | 0.32% | 0.00% | 0.16% | 0.30% | 0.40% |
WUH741414ALE6L4 | 0.43% | 0.12% | |||||||
Installed | 6002 | 8408 | 8410 | ||||||
WUH741816ALE6L0 | 0.14% | 0.12% | |||||||
Installed | 1767 | 2701 | |||||||
WUH741816ALE6L4 | 0.36% | ||||||||
Installed | 10801 |
Table 3. WDC 2022 quarterly reports contain cumulative installations and failure counts too.
Q1 | Count | Failures | AFR |
WUH741414ALE6L4 | 8408 | – | 0 |
WUH741816ALE6L0 | 2599 | – | 0 |
WUH741816ALE6L4 | 1200 | – | 0 |
Q2 | |||
WUH741414ALE6L4 | 8408 | 2 | 0.10% |
WUH741816ALE6L0 | 2702 | 2 | 0.30% |
WUH741816ALE6L4 | 1199 | 1 | 0.34% |
Q3 | |||
WUH741414ALE6L4 | 8409 | 5 | 0.24% |
WUH741816ALE6L0 | 2702 | – | 0.00% |
WUH741816ALE6L4 | 7138 | 6 | 0.71% |
Q4 | AFR | ||
WUH741414ALE6L4 | 8410 | 10 | 0.12% |
WUH741816ALE6L0 | 2701 | 3 | 0.12% |
WUH741816ALE6L4 | 10801 | 13 | 0.36% |
Table 4. Quarterly ships and failure counts for input to age-specific reliability estimation.
WUH741414ALE6L4 | WUH741816ALE6L0 | WUH741816ALE6L4 | ||||
Period | Ships | Failures | Ships | Failures | Ships | Failures |
2020Q4 | 6002 | |||||
2021Q1 | 602 | |||||
2021 Q2 | 602 | |||||
2021 Q3 | 601 | |||||
2021 Q4 | 601 | 35 | 1767 | 1 | ||
2022 Q1 | 0 | 832 | 0 | 1200 | 0 | |
Q2 | 2 | 103 | 2 | 0 | 1 | |
Q3 | 5 | 0 | 0 | 5139 | 6 | |
Q4 | 10 | 0 | 3 | 3633 | 13 |
How to Estimate Reliability from AFRs and Failure Counts Without Life Data?
I found cumulative installed base in quarterly reports and computed the quarterly ships n(t) so I didn’t have to infer installed base as in table 1, which made errors from the AFRs’ round-off and zeros.
Reliability R(t) estimates (table 5) depend on whether failure counts are dead forever or renewals. Backblaze data is recorded by HDD serial number so failure counts are probably dead forever. A Google spreadsheet for dead-forever reliability estimation is available [George 2023]. My brother and I did reliability estimation for Western Digital in 1994 (column 5). Figure 1 shows that WD HDD reliability hasn’t changed much since 1994! Table 6 and figures 2-4 compare AFRs and monthly actuarial rates.
Table 5 WD HDD reliability estimates.
Age, Months | WUH741414ALE6L4 | WUH741816ALE6L0 | WUH741816ALE6L4 | WD 1994 |
0 | 1 | 1 | 1 | 1 |
1 | 0.99983 | 1 | 1 | 0.99893 |
2 | 0.99583 | 1 | 0.99917 | 0.99485 |
3 | 0.99167 | 0.99899 | 0.99916 | 0.99485 |
4 | 0.98833 | 0.99899 | 0.99727 | 0.99485 |
5 | 0.97382 | 0.99899 | 0.99727 | 0.96769 |
6 | 0.97382 | 0.97087 | 0.99312 | 0.96769 |
7 | 0.95802 | 0.97087 | 0.99312 | 0.96769 |
8 | 0.95596 | 0.94175 | ? | 0.96769 |
9 | 0.95183 | 0.92233 | ? | 0.96769 |
10 | 0.22500 | 0.92233 | ? | 0.96769 |
11 | ? | 0.91262 | ? | 0.96769 |
Table 6. Compare monthly actuarial failure rates (columns 2, 4, 6, 8) and AFRs columns 3, 5, 7)
Age | WUC | AFR | WUC | AFR | WUC | AFR | WD |
Months | 741414ALE6L4 | 741414ALE6L4 | 741816ALE6L0 | 741816ALE6L0 | 741816ALE6L4 | 741816ALE6L4 | 1994 |
1 | 0.017% | 0.16% | 0.0% | 0.0% | 0.0% | 0.0% | 0.107% |
2 | 0.400% | 0.57% | 0.0% | 0.0% | 0.083% | 0.23% | 0.409% |
3 | 0.418% | 0.49% | 0.101% | 0.44% | 0.001% | 0.71% | 0.0% |
4 | 0.336% | 0.38% | 0.0% | 0.14% | 0.189% | 0.36% | 0.0% |
5 | 1.469% | 0.43% | 0.0% | 0.0% | 0.0% | 0.19% | 2.730% |
6 | 0.0% | 0.0% | 2.814% | 0.15% | 0.416% | 0.38% | 0.0% |
7 | 1.622% | 0.29% | 0.0% | 0.0% | 0.0% | 0.35% | 0.0% |
8 | 0.215% | 0.24% | 3.000% | 0.12% | 0.0% | ||
9 | 0.433% | 0.12% | 2.062% | 0.30% | 0.0% | ||
10 | 0.48% | 0.0% | 0.0% | 0.0% | |||
11 | 0.77% | 1.053% | 0.15% | 0.0% |
Estimate MTBF from AFRs?
MTBF=8766*AFR where MTBF is in hours and 8766 is the number of hours in an average year. That MTBF is potentially biased and variable. MTBFs are predictions or estimates of the mean lives of products or parts, MTBF=∫R(t)dt where R(t) is the reliability function and the integral is from 0 to infinity. 8766*AFR-based MTBF estimates may be biased, especially in early lives, before many have failed. Computing MTBF = ∫R(t)dt ≅ ∑R(t) requires extrapolation beyond available data: linear, exponential, seasonal, or curve fitting to popular reliability functions. I extrapolated failure rate functions, a(t), and R(t) = exp[‑∑a(s)] s=1,2,…,500. Table 7 shows results of alternative extrapolation methods for a Western Digital HDD.
Table 7. Compare 12-month average of MTBF=8766*AFR(t) vs. MTBF = ∫R(t)dt extrapolated beyond 9th month of actuarial failure rate estimates.
Method | Avg(AFR*8766) | Linear | Constant | Growth | Trend | ETS |
MTBF, months | 12.88 | 48.61 | 167 | 215 | 12.88 | 49.5 |
Linear increase in failure rate is a reasonable extrapolation of wearout. It yields a more reasonable HDD ~48-month MTBF than the average of the first 12 AFR*8766 values. The exponentially smoothed time series forecast (ETS) of 49.5 months agrees. Constant and exponential growth overestimate MTBF. Trend finds the linear extrapolation that fits, by least squares, the first 12 AFR*8766 values. The ~48- to 49-month MTBF seems more likely than the other methods.
Recommendations?
Compute AFRs if you want, but estimate age-specific reliability and failure rate functions too; you have the data. Failure rate functions account for the forces of mortality that cause failures. Make actuarial forecasts, because they’re more accurate and precise than time series extrapolations. Actuarial forecasts and their distributions help plan service and inventory better than AFR time-series extrapolations. If you must supply MTBF predictions, base them on extrapolations of age-specific reliability and actuarial failure rate function estimates.
References
Julio Calderon, “Effortless! How to Apply AFRs, MTBFs to Your Data Management Practice,” (3) Effortless! How to apply AFRs, MTBFs to your data management practice. | LinkedIn
J. G. Elerath, “AFR: problems of definition, calculation and measurement in a commercial environment,” Annual Reliability and Maintainability Symposium, 2000 Proceedings, International Symposium on Product Quality and Integrity (Cat. No.00CH37055), Los Angeles, CA, USA, pp. 71-76, doi: 10.1109/RAMS.2000.816286, 2000
L. L. George, “Estimate Reliability Functions Without Life Data”, ASQ Reliability Review, Vol. 13, No. 1, March 1993
L. L. George, Credible Reliability Prediction 2nd Edition, Credible Reliability Prediction – Field Reliability (google.com), CREDRP2020.PDF, June 2020
L. L. George, User Manual for Credible Reliability Prediction, User Manual for Credible Reliability Prediction – Field Reliability (google.com), CRPUSM1.PDF, June 2020
L. L. George, “Estimate Field Reliability Without Life Data,” Weekly Update, https://accendoreliability.com/estimate-field-reliability-without-life-data/#more-527694, Sept. 2023
Andy Klein, “Backblaze Drive Stats for 2022,” January 31, 2023
DoD Handbook, Failure Reporting, Analysis and Corrective Action Taken, Mil-Std-2155(AS), Dec. 1995
Leave a Reply