Convert AFRs to Field Reliability?

AFRs are periodic ratios of failure counts divided by installed base. Have you seen meeting rooms wallpapered with AFR charts (Annualized Failure Rate)? Have you sat through debates about the wiggles in AFR charts? Fred Schenkelberg wondered if reliability could be estimated from AFRs and their input data? How about age-specific reliability and actuarial failure rate functions? Actuarial forecasts? MTBFs? Wonder no more!

AFR [Annualized failure rate – Wikipedia] is a ratio of failures divided by time or installed base, computed periodically: AFR = (failures/Operating time)*(Annualization factor). Julio Calderon found that HDD AFR and vendor 8766/MTBFs didn’t agree! (AFR=8766/MTBF; there are 8766 hours in an average year.)

AFRs and MTBFs are not reliability! Reliability, R(t), is P[Life > t] for t ≥0, and a(t) is the actuarial failure rate, [R(t‑1)‑R(t)]/R(t-1). Don’t extrapolate AFRs to make failure forecasts! That’s like driving while looking backwards. Why not use AFR or FRACAS [MIL-STD-2155] input data to estimate age-specific reliability? Why not make actuarial failure forecasts? An actuarial forecast is ∑a(s)n(t-s), s=0,1,2,…,t, where n(t‑s) is the installed base of age t-s. Actuarial forecasts account for the forces of mortality that cause failures. Look ahead!

How to Extract Installed Base Given AFRs and Failure Counts?

Ships (cohort sizes or installed base by age t), n(t), and failures or returns counts, r(t), are statistically sufficient to make nonparametric estimates of reliability and actuarial failure rate functions [George 1993]. What if you have periodic AFRs and failure counts r(t) but not installed base, n(t)? (Failures or returns counts r(t) in period t could be returns shipped in any previous or current period.) Given failures or returns counts r(t) and successive AFRs, find ships or installed base cohort sizes n(t), t=0,1,2,…. The solution for a two-period solution is

n(1)=r1/AFR(1) and n(2)=(AFR(2)*r(1)-AFR(1)*r(2))/(AFR(1)*AFR(2)),

and the general solution is

n(t)=(AFR(t‑1)*r(t)-AFR(t)*r(t-1))/(AFR(t-1)*AFR(t)).

Table 1 is an example. AFR in column 5 is the ratio of Fails/Cum Ships. Cohorts “n(t)” values in column 6 are computed from the last formula. Cohorts in column 6 match simulated ships. That solution doesn’t work when either AFR() in denominator is zero or when the numerator is negative.

Table 1. Ships are simulated Poisson(1000) monthly; “Fails” are fake. AFR = Fails/Cum Ships. Reliability is the maximum likelihood estimator.

Months	Ships	Cum Ships	Fails	AFR	n(t)	Reliability
1	1016	1016	1	0.000984	1016	0.9990
2	1007	2023	3	0.001483	1007	0.9970
3	1012	3035	5	0.001647	1012	0.9951
4	968	4003	7	0.001749	968	0.9928
5	1029	5032	11	0.002186	1029	0.9893
6	1004	6036	14	0.002319	1004	0.9861

Backblaze Data?

Backblaze publishes HDD (Hard-Disk Drive) and flash drive AFRs and quarterly input data: {Mfg, Model, Size, failure count, Days, Failures, AFR}. Backblaze went public in 2022 (BLZE); their www.backblaze.com site sells cloud storage. Andy Klein’s quarterly reports are under their “About” menu. He reports on millions of HDDs and summarizes, in AFRs, information about individual HDDs. IDEMA published standards for tracking individual HDD lifetimes by “vintage” including use factors that may facilitate root cause analysis [Elerath].

AFRs are not as informative as age-specific field reliability and failure rate function estimates, especially as estimates evolve during product life cycles. Backblaze quarterly report data is sufficient to estimate age-specific field reliability. I picked the Western Digital HDDs because their data started from first shipments. Their failure counts do not include failures from previous, unknown ships cohorts. [It is possible to account for failures from previous cohorts; e.g. COVID-19 mutations, Field Reliability – Corona virus survival analysis (google.com).]

Table 2. WDC Western Digital – Wikipedia reports’ HDD AFRs. The Q4 AFRs come from annual reports.

	2020Q4	2021Q1	Q2	Q3	Q4	2022Q1	Q2	Q3	Q4
WDC	0.16%	0.57%	0.46%	0.39%	0.32%	0.00%	0.16%	0.30%	0.40%
WUH741414ALE6L4					0.43%				0.12%
Installed	6002				8408				8410
WUH741816ALE6L0					0.14%				0.12%
Installed					1767				2701
WUH741816ALE6L4									0.36%
Installed									10801

Table 3. WDC 2022 quarterly reports contain cumulative installations and failure counts too.

Q1	Count	Failures	AFR
WUH741414ALE6L4	8408	–	0
WUH741816ALE6L0	2599	–	0
WUH741816ALE6L4	1200	–	0
Q2
WUH741414ALE6L4	8408	2	0.10%
WUH741816ALE6L0	2702	2	0.30%
WUH741816ALE6L4	1199	1	0.34%
Q3
WUH741414ALE6L4	8409	5	0.24%
WUH741816ALE6L0	2702	–	0.00%
WUH741816ALE6L4	7138	6	0.71%
Q4			AFR
WUH741414ALE6L4	8410	10	0.12%
WUH741816ALE6L0	2701	3	0.12%
WUH741816ALE6L4	10801	13	0.36%

Table 4. Quarterly ships and failure counts for input to age-specific reliability estimation.

	WUH741414ALE6L4		WUH741816ALE6L0		WUH741816ALE6L4
Period	Ships	Failures	Ships	Failures	Ships	Failures
2020Q4	6002
2021Q1	602
2021 Q2	602
2021 Q3	601
2021 Q4	601	35	1767	1
2022 Q1		0	832	0	1200	0
Q2		2	103	2	0	1
Q3		5	0	0	5139	6
Q4		10	0	3	3633	13

How to Estimate Reliability from AFRs and Failure Counts Without Life Data?

I found cumulative installed base in quarterly reports and computed the quarterly ships n(t) so I didn’t have to infer installed base as in table 1, which made errors from the AFRs’ round-off and zeros.

Reliability R(t) estimates (table 5) depend on whether failure counts are dead forever or renewals. Backblaze data is recorded by HDD serial number so failure counts are probably dead forever. A Google spreadsheet for dead-forever reliability estimation is available [George 2023]. My brother and I did reliability estimation for Western Digital in 1994 (column 5). Figure 1 shows that WD HDD reliability hasn’t changed much since 1994! Table 6 and figures 2-4 compare AFRs and monthly actuarial rates.

Table 5 WD HDD reliability estimates.

Age, Months	WUH741414ALE6L4	WUH741816ALE6L0	WUH741816ALE6L4	WD 1994
0	1	1	1	1
1	0.99983	1	1	0.99893
2	0.99583	1	0.99917	0.99485
3	0.99167	0.99899	0.99916	0.99485
4	0.98833	0.99899	0.99727	0.99485
5	0.97382	0.99899	0.99727	0.96769
6	0.97382	0.97087	0.99312	0.96769
7	0.95802	0.97087	0.99312	0.96769
8	0.95596	0.94175	?	0.96769
9	0.95183	0.92233	?	0.96769
10	0.22500	0.92233	?	0.96769
11	?	0.91262	?	0.96769

Table 6. Compare monthly actuarial failure rates (columns 2, 4, 6, 8) and AFRs columns 3, 5, 7)

Age	WUC	AFR	WUC	AFR	WUC	AFR	WD
Months	741414ALE6L4	741414ALE6L4	741816ALE6L0	741816ALE6L0	741816ALE6L4	741816ALE6L4	1994
1	0.017%	0.16%	0.0%	0.0%	0.0%	0.0%	0.107%
2	0.400%	0.57%	0.0%	0.0%	0.083%	0.23%	0.409%
3	0.418%	0.49%	0.101%	0.44%	0.001%	0.71%	0.0%
4	0.336%	0.38%	0.0%	0.14%	0.189%	0.36%	0.0%
5	1.469%	0.43%	0.0%	0.0%	0.0%	0.19%	2.730%
6	0.0%	0.0%	2.814%	0.15%	0.416%	0.38%	0.0%
7	1.622%	0.29%	0.0%	0.0%	0.0%	0.35%	0.0%
8	0.215%	0.24%	3.000%	0.12%			0.0%
9	0.433%	0.12%	2.062%	0.30%			0.0%
10		0.48%	0.0%	0.0%			0.0%
11		0.77%	1.053%	0.15%			0.0%

Figure 2. Age-specific monthly failure rates (blue) resemble AFRs (orange)?

Figure 3. Age-specific monthly failure rates (blue) don’t resemble AFRs (orange)?

Figure 4. Age-specific monthly failure rates (blue) resemble AFRs (orange)?

Estimate MTBF from AFRs?

MTBF=8766*AFR where MTBF is in hours and 8766 is the number of hours in an average year. That MTBF is potentially biased and variable. MTBFs are predictions or estimates of the mean lives of products or parts, MTBF=∫R(t)dt where R(t) is the reliability function and the integral is from 0 to infinity. 8766*AFR-based MTBF estimates may be biased, especially in early lives, before many have failed. Computing MTBF = ∫R(t)dt ≅ ∑R(t) requires extrapolation beyond available data: linear, exponential, seasonal, or curve fitting to popular reliability functions. I extrapolated failure rate functions, a(t), and R(t) = exp[‑∑a(s)] s=1,2,…,500. Table 7 shows results of alternative extrapolation methods for a Western Digital HDD.

Table 7. Compare 12-month average of MTBF=8766*AFR(t) vs. MTBF = ∫R(t)dt extrapolated beyond 9^th month of actuarial failure rate estimates.

Method	Avg(AFR*8766)	Linear	Constant	Growth	Trend	ETS
MTBF, months	12.88	48.61	167	215	12.88	49.5

Linear increase in failure rate is a reasonable extrapolation of wearout. It yields a more reasonable HDD ~48-month MTBF than the average of the first 12 AFR*8766 values. The exponentially smoothed time series forecast (ETS) of 49.5 months agrees. Constant and exponential growth overestimate MTBF. Trend finds the linear extrapolation that fits, by least squares, the first 12 AFR*8766 values. The ~48- to 49-month MTBF seems more likely than the other methods.

Recommendations?

Compute AFRs if you want, but estimate age-specific reliability and failure rate functions too; you have the data. Failure rate functions account for the forces of mortality that cause failures. Make actuarial forecasts, because they’re more accurate and precise than time series extrapolations. Actuarial forecasts and their distributions help plan service and inventory better than AFR time-series extrapolations. If you must supply MTBF predictions, base them on extrapolations of age-specific reliability and actuarial failure rate function estimates.

References

Julio Calderon, “Effortless! How to Apply AFRs, MTBFs to Your Data Management Practice,” (3) Effortless! How to apply AFRs, MTBFs to your data management practice. | LinkedIn

J. G. Elerath, “AFR: problems of definition, calculation and measurement in a commercial environment,” Annual Reliability and Maintainability Symposium, 2000 Proceedings, International Symposium on Product Quality and Integrity (Cat. No.00CH37055), Los Angeles, CA, USA, pp. 71-76, doi: 10.1109/RAMS.2000.816286, 2000

L. L. George, “Estimate Reliability Functions Without Life Data”, ASQ Reliability Review, Vol. 13, No. 1, March 1993

L. L. George, Credible Reliability Prediction 2^nd Edition, Credible Reliability Prediction – Field Reliability (google.com), CREDRP2020.PDF, June 2020

L. L. George, User Manual for Credible Reliability Prediction, User Manual for Credible Reliability Prediction – Field Reliability (google.com), CRPUSM1.PDF, June 2020

L. L. George, “Estimate Field Reliability Without Life Data,” Weekly Update, https://accendoreliability.com/estimate-field-reliability-without-life-data/#more-527694, Sept. 2023

Andy Klein, “Backblaze Drive Stats for 2022,” January 31, 2023

DoD Handbook, Failure Reporting, Analysis and Corrective Action Taken, Mil-Std-2155(AS), Dec. 1995