Fred’s Bicycles and Kaplan-Meier Error?

The Kaplan-Meier reliability estimator errs on Fred’s bicycle ships and failure data! The Kaplan-Meier estimate was computed from Fred’s bicycles’ grouped failure data in the body of a “Nevada” table. It disagrees with the reliability estimate from ships cohorts and monthly failures (without knowing which cohort the failures came from). It disagrees with least squares nonparametric reliability estimates. All but the Kaplan-Meier estimate agree! Which would you prefer?

The Kaplan-Meier estimator is a nonparametric maximum likelihood reliability estimator (npmle). So is the npmle from ships cohorts and monthly failures (bottom row). There’s more information in grouped (by cohort) failure counts in a Nevada table than in sums of monthly failures without regard to which cohort they came from, so why does the Kaplan-Meier reliability estimator from the grouped failures in the body of Nevada table disagree with the reliability estimates from the sums of monthly failures and from least-squares reliability estimates?

Table 1. Typical Kaplan-Meier Input. Time is time-to-failure or censoring time if no failure.

Unit	Censored?=0	Time	Group
1	1	39	A
2	1	44	B
3	0	61	A
4	1	78	A
5	0	89	B

Nevada table 2 contains failure counts grouped by cohort in each row. The bottom row contains monthly sums of failures, without regard to which cohort they came from. Ships cohorts and monthly failure sums are statistically sufficient to make nonparametric reliability estimates. Generally Accepted Accounting Principles require data containing ships and failure counts, and it is population, not sample, data.

Table 2. Fred’s Bicycles: body of Nevada table contains grouped failure counts

Month	Ships	Jan	Feb	March	April	May	June
Jan	3519	3	6	3	7	10	3
Feb	6292		4	8	20	35	24
Mar	7132			8	14	25	31
Apr	5633				4	13	6
May	4222					6	8
Jun	4476						6
Sums	31274	3	10	19	45	89	78

Compare the Kaplan-Meier (K-M R(t)) reliability estimate from Fred’s data in body of Nevada table with the nonparametric maximum likelihood estimate from ships and bottom row monthly failures. (I call them “returns”.) (S&R R(t)).

Figure 1. Kaplan-Meier estimate (K-M R(t)) and the reliability estimate from ships and returns (S&R R(t)) disagree

Is this an Error by George?

Why the disagreement between maximum likelihood estimates? You may think I screwed up the Excel spreadsheet Kaplan-Meier estimator, R(t)=∏(1-d(s)/n(s)), s=1,2,…,t, where d(s) are deaths in age interval s and n(s) are survivors to age s [George, ASQ Reliability Review, 2003]. So I computed the Kaplan-Meier reliability estimator by maximizing its binomial likelihood function,

L=∏Binomial(d(s),n(s),a(s))=∏(a(s)^d(s)*((1-a(s))^n(s))*COMBIN(n(s),d(s)), s=1,2,…,6

using Excel’s Solver and combinatorial function. The a(s) function values are actuarial failure rates in age-interval s (conditional survival to age s). The nonparametric maximum likelihood reliability estimate is R(t) = Exp[-∑a(s)], s=1,2,…,t, when actuarial failure rates a(s) maximize likelihood function. The COMBIN(n(s),d(s)) function is irrelevant for maximization, if you ignore random ships cohorts. The Kaplan-Meier estimator, solves dlogL/da(t)=0; it doesn’t depend on variability of n(t) in Excel function COMBIN(n(t),d(t)). Perhaps it should?

Table 3. Kaplan-Meier and maximum likelihood reliability estimators from table 2 data. They agree about as well as Excel and Solver can manage.

Age, months	Kaplan-Meier	Max. Likelihood
1	0.999009	0.99901
2	0.99718	0.997187
3	0.994789	0.994804
4	0.99048	0.990522
5	0.987017	0.987077
6	0.986168	0.98623

How bad is the disagreement? Kullback-Leibler divergence quantifies difference(s) between two functions in a number (bits if you use logarithm to the base 2). Please see Wikipedia for entropy and Kullback-Leibler divergence = ∑p1(t)*Log2[p1(t)/p2(t)] bits, where p1(t) and p2(t) are two probability distribution functions such as R(t)-R(t+1). It’s a maximum entropy (information) numeric measure of the functions’ difference! It’s value for the table 3 reliability functions is -0.00906 bits, which doesn’t tell me much. Why the difference? Sequences of returns? Do cohorts differ? Does reliability change?

Source of Kaplan-Meier Error?

Compute reliability estimates from alternative subsets of data: Jan, Feb,…,June cohorts vs. subsets of ships and monthly failure sums. Display the estimates in figures 2 and 4, called “Broom Charts” by Jerry Ackaret. I apologize for showing smooth curves: technically, nonparametric estimators are step functions (figure 1).

Longer lines are for earlier cohorts. The fat line in figure 2 is the Kaplan-Meier estimate from all data. Cohort 1 from January (blue) has the highest reliability. Cohort 2, the largest cohort, matches the Kaplan-Meier estimate. Differences in cohort reliability estimates increase the variances of the Kaplan-Meier estimate but should not invalidate the Kaplan-Meier estimate?

Figure 2. Broom chart of Kaplan-Meier reliability estimates for each cohort.

Figure 3 shows Kullback-Leibler divergences between reliability function estimates of successive pairs of cohorts. It’s pretty obvious that January and February differ from each other and February from March. The reference by Belov and Armstrong explains the distribution of Kullback-Leibler divergence for use in statistical reliability control (Statistical Process Control on reliability functions). Kullback-Leibler divergences do not explain the figure 1 difference between the Kaplan-Meier and the estimate from cohort sizes and monthly failure counts.

Figure 3. Kullback-Leibler divergences by adjacent cohort pairs: January cohort vs. February cohort, etc.

The reliability estimator from ships and returns (monthly failure sums) maximizes the likelihood ∏P[Ships(t)]*P[Returns(t)] where G(t) = 1-R(t). This assumes an M/G/infinity self-service model of ships and returns with ships as a stationary Poisson process and service time distributed G(t). The likelihood is ∏POISSON.DIST(Ships(t),λ,FALSE)*POISSON.DIST(Returns(t),λ*G(t),FALSE). The likelihood for nonstationary Poisson ships is similar [Eick, Massey and Whitt, George 2024].

Figure 4 broom chart is constructed by deleting monthly failures sums from present to January. I.e. delete June, then delete May and June, etc., and then compute the nonparametric maximum likelihood reliability R(t)=1-G(t). The reliability estimate from all returns is highest (blue). Note the scale of figure 4 is from 1.0 down to 0.949. Figure 3 vertical axis scale (K-M) is from 1.0 down to 0.985. It seems that the cohort reliability estimates by cohort (figure 2) resemble those in figure 4 except in magnitude: the oldest, largest cohort has highest reliability and so forth. Could reliability have been getting worse? Perhaps the Kaplan-Meier reliability estimator fails when the cohorts vary?

Figure 4. Broom chart of nonparametric maximum likelihood reliability estimates from ships and returns

Least Squares Verifies the Estimate from Ships and Returns!

For Nevada table data, least squares estimates minimize ∑∑[Returns(i; j)-Hindcast(i; j)]² (SSE) over cohorts from month i and ages-at-failures j. Hindcasts are forecasts of past, already observed, returns or failures. Hindcasts are actuarial forecasts: ∑(a(s;i)*n(t-s;i), for Jan-June (i) cohorts’ returns and ages, and reliability R(t) = 1-G(t)= exp[-∑a(s)], s=1,2,…,t.

For ships and monthly returns, least squares estimates minimize the sum of squared differences (errors “SSE”) between observed returns and hindcasts, Jan.-June, ∑(Returns(i)-Hindcast(i))².

Kaplan-Meier SSE=335. Ships and returns SSE=207. Of course: S&R SSE is sum over 6 months’ returns. K-M SSE is sum over 6 months and 21 months’ returns by cohort.

The least squares and the nonparametric maximum likelihood estimates from ships and returns in figure 5 agree with each other but disagree with the Kaplan-Meier reliability estimate (upper curve, orange).

Figure 5. Alternative estimates. Nevada table Kaplan-Meier estimate is the orange line.

AIC=Akaike Information Criteria [Wikipedia]

The AIC incorporates the number of estimated parameters as well as the likelihood. AIC=2k-2Log(Likelihood,2) bits where k is number of parameters estimated; e.g., 6 for ages t=1,2,…,6. Small AIC is better. Nevada table Kaplan-Meier estimator AIC = 59.17 (6 parameters)

Ships and returns AIC = 58.49 for stationary M/G/infinity model of monthly ships and returns (6 parameters) or 159.32 for nonstationary M(t)/G/infinity model (12 parameters).

Conclusions? Better Watch Out!

This is not the first article about shortcomings of the Kaplan-Meier estimator and its variance [Keissling et al. (AI), Rupert Miller]. Artificial Intelligence is coming. Miller complained about using nonparametric reliability statistics when a parametric reliability function might fit better.

Don’t trust Kaplan-Meier estimates from Nevada table data with random cohorts [https://accendoreliability.com/kaplan-meier-ignores-cohort-variability/]. Compare the Kaplan-Meier estimates with the nonparametric maximum likelihood estimate from periodic ships and returns or least-squares reliability estimates.

I couldn’t spot why the Kaplan-Meier estimate errs for Fred’s data, but I suspect it is due to cohort differences and to decreasing reliability compared with January cohort’s reliability. Could there be some renewals in the “failure” counts in the Nevada table?

References

Dmitry I. Belov and Ronald D. Armstrong, “Distributions of the Kullback–Leibler divergence with applications,” British Journal of Mathematical and Statistical Psychology 64, 291–309, 2011

S. G. Eick, W. A. Massey and W. Whitt, “The Physics of the M(t)/G/Infinity Queue,” Ops. Res. , vol. 41, pp. 731-742, 1993

E. L. Kaplan and P. Meier, ”Nonparametric Estimator From Incomplete Observations,” J. Amer. Statist. Assn.,Vol. 53, pp. 457-481, 1958

Jonas Kiessling, Aston Brunnberg, Gustaf Holte, Nikolaj Eldrup, and Karl Sörelius, “Artificial Intelligence Outperforms Kaplan-Meier Analyses Estimating Survival after Elective Treatment of Abdominal Aortic Aneurysms,” Eur J Vasc Endovasc Surg., vol. 65, pp. 600-607, 2023

Rupert G. Miller, “What Price Kaplan-Meier?” Biometrics vol. 39, no. 4, pp. 1077–81, https://doi.org/10.2307/2531341, 1983

Fred Schenkelberg, “Nevada Charts to Gather Data,” https://accendoreliability.com/nevada-charts-gather-data/, March 2016

References by George

L. L. George and A. Agrawal, “Estimation of a Hidden Service Distribution of an M/G/Infinity Service System,” Naval Research Logistics Quarterly, vol. 20, pp. 549-555, https://doi.org/10.1002%2Fnav.3800200314, 1973

L. L. George, “GAAP field data:” https://accendoreliability.com/do-the-best-you-can-with-available-data/#more-541467

L. L. George, “Statistical Reliability Control,” https://accendoreliability.com/statistical-reliability-control/#more-522710, 2024

L. L. George, “Kaplan-Meier Reliability Estimation Spreadsheet,” ASQ Reliability Review, Vol. 25, No. 2, pp. 6-12, June 2005, (2017 revision available from pstlarry@yahoo.com)