The Kaplan-Meier reliability estimator is for dead-forever products or parts, given individual lifetime data or a “Nevada” table of periodic ships cohorts and their grouped failure counts. This estimator presumes that ships cohorts are NOT random. Production, sales, installed base, and cohort case counts are random! What does that do to Kaplan-Meier reliability estimates? What is the nonparametric reliability function estimator if ships cohorts are random?
Kaplan-Meier Reliability Estimator from Nevada table data
Statistical software requires input data by subject: {subject identifier, observation time, failed or censored?}. It is convenient to summarize individual subjects and failure count data collected periodically in a “Nevada” table such as table 1. It is called a Nevada table because it looks like (a mirror image of) Nevada on its side.
Fred published the ships and grouped failure counts data in the table 1. The Kaplan-Meier reliability estimator is R(t) = P[Life >t] = PRODUCT(1-Deaths(j)/Survivors(j)), j=0,1,2,…,t. Survivors(j) are the cohort members that have not failed by age j. E.g., In January there are 3 failures out of 3519 beginning in January. In February there are 3519-3 survivors of age 2. Deaths(j) are the sums on the table 1 diagonals of age j. E.g., 3+3+8+4+5+6 =29 failures of age 1. So reliability at age 1 is R(1) = 29/31466 = 0.9991, etc. (table 2).
Table 1. Ships and failure counts grouped by cohort [Schenkelberg]
Month | Ships | Jan | Feb | Mar | Apr | May | Jun |
Jan | 3519 | 3 | 6 | 3 | 7 | 10 | 3 |
Feb | 6292 | 3 | 8 | 20 | 35 | 24 | |
Mar | 7132 | 8 | 13 | 25 | 31 | ||
Apr | 5633 | 4 | 13 | 6 | |||
May | 4222 | 5 | 8 | ||||
Jun | 4476 | 6 | |||||
Sums | 31466 | 3 | 10 | 19 | 45 | 88 | 78 |
The Kaplan-Meier estimator is a maximum likelihood estimator. The likelihood function with deaths d(j), survivors n(j), and actuarial failure rates p(j) conditional on survival to age j periods up to the oldest failure at age t is
∏[Binomial(d(j), n(j), p(j))], j=1, 2 ,…,t.
The Kaplan-Meier estimator is also a least-squares estimator according to the Martingale central limit theorem [Wellner]. Improvements to the Kaplan-Meier estimator use the censored or suspended lifetimes and the censoring time distribution to fake failure data from censored lifetimes [Khan et al., Vinzamuri et al.]. That doesn’t account for random ships cohorts.
Random Ships Cohorts?
I learned actuarial methods for gas turbine engine management while working for the US Air Force Logistics Command. AFM 66-1 assumes failure counts have Poisson distributions. That avoids the problem of estimating the variances of estimates of reliability functions, actuarial rates, or actuarial forecasts. (Poisson mean equals variance.) The Poisson assumption could be attributable to the fact that the outputs (bottom row “Sums” of table 1) of an M/G/infinity self-service system have Poisson distributions. The “M” in M/G/Infinity means inputs have Poisson distribution(s) regardless of life-time distribution “G(t)”. Ships cohorts and the bottom row sums in table 1 are statistically sufficient to make nonparametric maximum likelihood and least-squares estimates of reliability and actuarial rate functions. [George and Agrawal, Mirasol]
The Kaplan-Meier estimator uses the grouped-by-cohort failure counts, not the bottom row sums of failure counts. What is the maximum likelihood estimator if ships cohorts have a stationary Poisson distribution? The likelihood function is
∏Binomial[d(j), n(j), p(j)]*Poisson[s(j),λ], j=1,2,…,t. (n(j)=survivors out of cohorts s(1), s(2),…etc.)
The mean of the cohort sizes s(j) in table 1 column 2 is 5212, the variance is 1,883,617, and the skew is 0.253. The Poisson distribution has equal mean and variance. Fred’s ships are unlikely to have a stationary Poisson distribution.
Individual months’ ships could have Poisson distributions with means equal to reported ships, as in nonstationary Poisson processes Poisson[l(j)]. Barry Nelson and Larry Leemis observed that it’s hard to disprove with a single ships observation s(j) for each cohort. Then the likelihood function is
∏Binomial[d(j), n(j), p(j)]*Poisson[s(j),λ(j)].
For example, suppose product or part lifetimes can be regarded as the service times in a self-service system, represented by an M(t)/G/Infinity system with nonstationary Poisson[λ(t)] input cohorts. The distribution of M(t)/G/infinity output in the first period, Deaths(1), is Poisson(λ(1)G(1)) where G(1) = p = 1‑R(1) = d(1)/s(1). The maximum likelihood estimator with random cohorts is the same.
Maximum Likelihood Estimators with Random Ships Cohorts?
The maximum likelihood estimator of R(1) is 1-d(1)/s(1) regardless of whether cohort 1 is random. Mathematica wouldn’t solve for the formula for the maximum likelihood estimator of R(2), so I made spreadsheets with alternative likelihood functions. Excel’s Solver maximizes the likelihoods as functions of the cohort parameters and the reliability function’s decrements p(j)=R(j-1)-R(j).
∏Binomial[d(j), n(j), p(j)]*Normal[mean, stdev], j=1,2,…,t.
The Poisson() and Normal() reliability estimators in table 2 agree! But they disagree with the Kaplan-Meier estimator. The Kaplan-Meier estimator seems biased high compared to when ships cohorts are random. The Poisson(λ) maximum likelihood estimator of λ is 5212.319, close to the average ships 5212.333. (The maximum likelihood estimator of the rate of a stationary Poisson process is the average of cohort sizes.) The Poisson(λ(t)) estimator uses the original ships cohorts as Poisson rate parameters for each cohort. The Normal mean and standard deviation estimates are 5212 and 1253 (variance estimate is 1,569,677 vs. variance of ships cohorts 1,883,617). Both the Poisson[λ(t)] and the Normal[.] likelihoods are greater (less negative) than the Poisson(λ) likelihoods. They are all more negative than the Kaplan-Meier likelihood, which doesn’t account for random cohort sizes at all.
Figure 1 shows the disagreement in reliability function estimators is small. However the bias of the Kaplan-Meier estimator is worse if reliability is worse than in table 1! The Kullback-Leibler divergence from the Kaplan-Meier estimator is 0.000459, positive. This indicates information gained from the maximum likelihood estimator with random cohorts instead of the Kaplan-Meier estimator. Furthermore, the asymptotic Greenwood variance of the Kaplan-Meier estimator fails for finite sample sizes. What is the variance of the reliability estimator with random ships cohorts? Wait for the next article or send data to pstlarry@yahoo.com to find out.
Table 2. Compare the reliability estimates of Kaplan-Meier (K-M) vs. Poisson vs. Normal ships.
Age, Months | K-M R(t) | Poisson(λ) | Poisson(λ(t)) | Normal |
1 | 0.999073 | 0.999073 | 0.999073 | 0.999073 |
2 | 0.997282 | 0.997280 | 0.997280 | 0.997280 |
3 | 0.995268 | 0.994882 | 0.994882 | 0.994882 |
4 | 0.990942 | 0.990551 | 0.990551 | 0.990551 |
5 | 0.987468 | 0.987055 | 0.987055 | 0.987055 |
6 | 0.986620 | 0.986195 | 0.986195 | 0.986195 |
LogLikelihood | -55.974 | -991.314 | -87.076 | -107.287 |
Figure 1 The maximum likelihood estimators agree pretty well. The Kaplan-Meier (KM R(t)) estimator is biased high compared with estimators that account for random cohorts.
References
AFM 66-1, “Maintenance Management, Vol. 1,” Washington, DC, August 1972
Greenwood, M. “The Errors of Sampling of the Survivorship Tables,” Reports on Public Health and Statistical Subjects,no. 33, London: HMSO. Appendix 1, 1926
E. L. Kaplan and Paul Meier, “Non–Parametric Estimation From Incomplete Data”. Jour. Amer. Statist. Assn., Vol. 53, pp. 457–481, 1958
Habib Nawaz Khan, Qamruz Zaman, Fatima Azmi, Gulap Shahzada, and Mihajlo Jakovljevic, “Methods for Improving the Variance Estimator of the Kaplan–Meier Survival Function, When There Is No, Moderate and Heavy Censoring-Applied in Oncological Datasets,” Front. Public Health, Sec. Health Economics, Volume 10, https://doi.org/10.3389/fpubh.2022.793648, May 2022
Fred Schenkelberg, “Nevada Charts to Gather Data,” Nevada Charts to Gather Data – Accendo Reliability
L. L. George and A. Agrawal, “Estimation of a Hidden Service Distribution of an M/G/∞ Service System,” Naval Research Logistics Quarterly, vol. 20, pp. 549-555, https://doi.org/10.1002%2Fnav.3800200314, 1973
Noel M. Mirasol, “The Output of an M/G/Infinity, Queuing System Is Poisson,” Operations Research, Vol. 11, No. 2, pp. 282-284, Mar.-Apr. 1963
Barry L. Nelson and Lawrence M. Leemis, “The Ease of Fitting but Futility of Testing a Nonstationary Poisson Processes From One Sample Path,” Proceedings of the 2020 Winter Simulation Conference, IEEE, 2020
Bhanukiran Vinzamuri, Yan Li and Chandan K. Reddy, “Calibrated Survival Analysis using Regularized Inverse Covariance Estimation for Right Censored Data.” IEEE Trans. on Knowledge and Data Engineering, http://ieeeexploree.ieee.org/Xplore
Jon A. Wellner, “Notes on Greenwood’s Variance Estimator for the Kaplan-Meier Estimator,” Greenwood.pdf (washington.edu), Jan. 2010
Larry George says
I should have cited Olli S. Miettinen , “Survival Analysis: Up from Kaplan–Meier–Greenwood,” Eur. J. Epidemiol., Vol. 23, pp-585–592, DOI 10.1007/s10654-008-9278-7, 2008 [Citation added after publication.] The author recognized that ships could be random (Poisson). The article was provoked by observed errors in Greenwood’s variance estimator when censoring dominates. The article uses a Kaplan-Meier type estimator. Let me know if you want the real nonparametric max. likelihood reliability estimator and its variance.
In the likelihood function, ∏Binomial[d(j), n(j), p(j)]*Normal[mean, stdev], the normal mean and variance are the parameters of the ships (cohorts) distribution, if you believe ships have normal distribution.