Variance of the Kaplan-Meier Estimator?

The well-known variance of the Kaplan-Meier reliability function estimator [Greenwoood, Wikipedia] can drastically under-or over-estimate variance. The covariances of the Kaplan-Meier reliability pairs at different ages are ignored or neglected. Variance errors and covariance neglect bias the variance of actuarial demand forecasts. Imagine what errors and neglect do to confidence bands on reliability functions.

The Kaplan-Meier reliability function estimator is used when time-to-failure data are censored and grouped by start period cohorts. References about its variance-covariance matrix are scarce. “Survival and Event History Analysis…” page 91 lists a slight difference from the Greenwood variance for the Nelson-Aalen estimator cumulative failure rate function [Aalen et al.]. Wayne Nelson sent me citations of his paper and an earlier paper by Odd Aalen [Nelson]. They show the cumulative failure rate function estimates are independent. However, the Kaplan-Meier reliability and actuarial failure rate function estimates are not. I use Mathematica to compute the Cramer-Rao bound on variance-covariance matrix for small numbers of cohorts. See reference by Paul Tune for computation methods that avoid inversion of Fisher information matrix.

Pointwise confidence limits on reliability estimates use the Greenwood variance formula [Sawyer, Freedman]. There are legitimate confidence bands on the Kaplan-Meier reliability function estimator [Hall and Wellner], based on the covariance function of estimator’s asymptotic multivariate normal distribution. The confidence bands are a function of unknown underlying distributions of failure and censoring times. “Programs for the calculation of the bands are available from the authors.” [In C, https://sites.stat.washington.edu/jaw/RESEARCH/SOFTWARE/software.list.html/] I need the variance-covariance matrix of actuarial failure rates corresponding to the Kaplan-Meier estimator for the variance and standard deviation of actuarial failure forecasts, for logistics, spares stock levels, availability, etc.

In a previous article I used the Cramer-Rao bound on the variance-covariance matrix to compute the standard deviations of actuarial failure forecasts from ships and returns counts [“ESG and Reliability?” George]. I used Mathematica to do the same for the first example in tables 1 and 2, because the Greenwood variance is a Cramer-Rao bound for maximum likelihood estimators.

Greenwood Variance vs. Empirical Variance?

The Greenwood variance bounds reliability function estimates under some regularity conditions (positive definite matrixes and existence of Jacobian matrix). Maximum likelihood estimator variance converges asymptotically to the Cramer-Rao lower bound under some conditions. What happens if the data are not asymptotic? If your product is reliable, then failures are scarce and the Kaplan-Meier reliability function estimate is not asymptotic; if data are censored then the reliability function estimate is even farther from asymptotic [Khan et al.]. The following examples show empirical standard deviation estimates for comparison with Greenwood standard deviations.

The Kaplan-Meier estimator uses periodic ships (cohorts) and grouped, censored failures from each cohort. Table 1 and 3 show typical data in “Nevada” tables, so-called because the grouped failure data look like Nevada on its side.

The ships and grouped failures in table 1 gives the same reliability estimates from each cohort as the Kaplan-Meier estimate (table 2). The standard deviations (table 2) and covariances of cohort reliability estimates are zero, because the reliability estimates didn’t vary.

Table 1. Stupid example of ships (cohorts) and grouped failures

Period	Ships	1	2	3
1	100	5	10	15
2	100		5	10
3	100			5

Table 2. Kaplan Meier reliability estimate and standard deviation estimates. Residuals are the differences between cohort reliability estimates and the K-M reliability estimate.

Age	K-M Rel	Greenwood Stdev	Empirical Stdev	Residuals Stdev
1	0.95	0.0126	0	0
2	0.85	0.024	0	0
3	0.70	0.403	0	0

Table 3. Ships and grouped failures from “Weibull Analysis of Perplexing Field Data,” by James McLinn

Week	Ships	1	2	3	4	5	6	7	8
1	20	1	0	1	0	1	0	1	0
2	50		1	0	1	0	1	0	1
3	70			1	0	1	0	1	0
4	100				1	0	1	0	1
5	100					1	0	1	0
6	100						1	0	1
7	120							1	0
8	120								1

Table 4. Reliability estimates from each cohort in table 3. Estimate the cohort empirical reliability functions as, R(t)*(1‑deaths(t)/Survivors(t‑1)); t=1,2,….

Table 4. Kaplan Meier reliability estimate and empirical reliability estimates from each cohort in table 3.

Age, week	K-MRel.	1	2	3	4	5	6	7	8
1	0.989	0.950	0.950	0.897	0.897	0.842	0.842	0.783	0.783
2	0.989	0.980	0.980	0.960	0.960	0.939	0.939	0.917	0.917
3	0.977	0.986	0.986	0.971	0.971	0.957	0.957	0.942
4	0.977	0.990	0.990	0.980	0.980	0.970	0.970
5	0.962	0.990	0.990	0.980	0.980	0.970
6	0.962	0.990	0.990	0.980	0.980
7	0.940	0.992	0.992	0.983
8	0.940	0.992	0.992

Estimate the sample variance-covariance matrix by treating each cohort’s grouped failure counts as independent random samples. Compute the variance-covariance matrix from table 4 columns. There are alternative variance-from-the-mean formulas: variance from the averages or variance from the Kaplan-Meier estimates from all cohorts (residuals). Neither is exactly the mean, because the average has sample variability and the Kaplan-Meier estimator is asymptotically unbiased.

Table 5. Variance-covariance matrix of the empirical reliability function estimates from each cohort. Covariances are not negligible!

Age, Weeks	1	1	3	4	5	6	7	8
1	0.0002
2	0.0002	0.0002
3	0.0004	0.0004	0.0008
4	0.0004	0.0004	0.0009	0.0009
5	0.0007	0.0007	0.0015	0.0015	0.0023
6	0.0008	0.0008	0.0016	0.0016	0.0025	0.0025
7	0.0011	0.0011	0.0023	0.0023	0.0035	0.0035	0.0049
8	0.0010	0.0010	0.0021	0.0021	0.0033	0.0033	0.0045	0.0045

Note: the Excel VAR() function does not compute what you may think it should, ∑(observations‑average)^2/Sample size! If you want that variance, use the VARP() function. VAR() computes ∑(observations‑average)^2/(n-1). I also checked the COVAR() function; it produces what you should think it should,
∑∑(observations(i)-average(i))*(observations(j)-average(j))/Sample size.

Figure 1. Greenwood and jackknife standard deviation {“JKstdev”) estimates — Figure 1. Compare alternative reliability function variance estimators. SSE variance is variance from the Kaplan Meier estimator

Figure 2. Compare alternative reliability function variance estimators. SSE variance is variance from the Kaplan Meier estimator

Table 6 cohort ships and grouped failure counts appear to have been simulated from the same distribution for every cohort. The ships and returns process seems suspiciously stationary. The broom chart of reliability estimates from each cohort coincide (figure 3).

Table 6. Data from https://www.weibull.com/hotwire/issue119/relbasics119.htm “Predicting Warranty Returns in Weibull++7”

Month	Sales	1	2	3	4	5	6	7	8	9
Mar-10	1623	1	3	5	7	9	11	12	15	17
Apr-10	3723		2	7	11	17	20	25	30	33
May-10	1319			0	3	4	6	7	9	10
Jun-10	3600				2	6	12	15	20	25
Jul-10	3298					2	6	10	14	19
Aug-10	1333						0	3	4	6
Sep-10	1584							0	3	5
Oct-10	4508								2	9
Nov-10	4463									2

Figure 3. Broom chart of reliability estimates from each subset are the same (superimposed), just for shorter durations. — Figure 3. Broom chart of reliability estimates from each cohort are the same, just for shorter durations.

Figure 4. Greenwood standard deviation is much smaller than the empirical standard deviation estimates. The simulated standard deviation estimates are close to Greenwood.

So What?

The actuarial forecast (or hindcast) of failures in period t is ∑a(s)*n(t-s); s=1,2,…,t, where n(t-s) is the installed base of age t-s, and a(s) is the actuarial failure rate conditional on survival to age s. The actuarial forecast variance is

∑Var[a(s)]*n(t-s)^2+∑∑Covar[a(t-s),a(s)]*n(s)*n(t-s).

The first term depends on Var[a(s)] so using Greenwood’s variance could lead to biased forecast. The covariance is also needed; the second term could be significant.

The covariances of the Nelson-Aalen cumulative failure rate function estimates are zero [Wellner, Nelson, Aalen], but actuarial failure rates could have covariance. Neglecting actuarial rate covariances under-or over-estimates the variance of actuarial forecasts (figures 2 and 4 bar-charts). The jackknife estimators of the “survival function integrals” may interest you [Azarang et al.], because the survival function integral is the MTBF, if lifetime data are uncensored. If data are censored, extrapolation is required. Bootstrap resampling of the Kaplan-Meier estimator yields data for estimation of the variance-covariance matrix, but does not capture the variation between cohorts.

Table 7. Variance-covariance matrix of Kaplan-Meier reliability estimates from table 1 data

Age	1	2	Act. hindcast	Stdev
1	0.00016355	0.00000477	5	1.11E-14
2	0.00000477	0.0017588	14.5	0
1	NA()	NA()	22.325	NA()

Table 8. Variance-covariance matrix of actuarial failure rate function from table 4 cohort reliability estimates from Jim McLinn’s data

Week	1	2	3	4	5	6	7	8
1	0.000164
2	0	0
3	0.000214	0	0.000241
4	0	0	0	0
5	0.000292	0	0.000331	0	0.000378
6	0	0	0	0	0	0
7	0.000377	0	0.000432	0	0.000498	0	0.00058
8	0	0	0	0	0	0	0	0

Table 9. Hindcasts and their variances. The last column is the square root of hindcast variances, the standard deviation of the actuarial hindcast. A hindcast is a forecast for a period in the past

Week	Ships	Act. rate	Hindcast	Observed	∑VAR…	∑∑COVAR…	Stdev
1	20	0.01	0.22	1	0.07		0.26
2	50	0.00	0.54	1	0.07	0.00	0.26
3	70	0.01	0.63	2	1.25	0.60	1.36
4	100	0.00	1.27	2	1.25	0.00	1.12
5	100	0.02	1.52	3	5.03	5.22	3.20
6	100	0.00	1.52	3	5.03	0.00	2.24
7	120	0.02	2.29	4	13.39	21.02	5.87
8	120	0.00	2.75	4	13.39	0.00	3.66

Each cohort input to the Kaplan-Meier estimator includes grouped failures at ages up to the oldest cohort’s age. Use the empirical reliability estimates from each cohort to compute an estimate of the variance-covariance matrix. They’re independent samples even though they are from cohorts of different sizes, with different numbers of grouped failure counts, with different maximum ages. If you want to forecast future returns, extrapolate the ships and actuarial failure rates. I use regression, which gives me some indications of standard deviations to plug into table 4, the reliability function estimates for each cohort.

Recommendations?

Don’t believe the Greenwood variance of the Kaplan-Meier reliability function estimate. You may argue that the empirical reliability estimates from each cohort are not of the same size, because each successive cohort is one age-interval shorter. I agree with that and that the variance-covariance estimates of the oldest units may suffer from small failure counts. I am not referring to the cohort sizes (ships) but to the numbers of reliability estimates in each cohort and to small failure counts of reliable products. Perhaps I should use weighted variance-covariance estimates [Khan et al.]. Perhaps I should derive weights that minimize distance of the empirical variance-matrix from the Cramer-Rao bound. I don’t know. Help?

Can I help you with the variance of actuarial forecasts? Need to set spares inventories? Confidence bands on reliability function estimates? Reliability or survival function estimation without life data? Suppose all the available data was the ships counts and the sums of failure counts in each column of the Nevada table. That’s population data required by GAAP from revenue and service costs. Kaplan-Meier requires lifetime data. Ships and returns counts are population data. Is your Kaplan-Meier data a sample? How much does it cost to track lifetimes? How many errors?

You may ask, “How are you going to estimate the variance-covariance of the nonparametric estimator from ships and returns counts? You used the Cramer-Rao bound in previous article on the variance-covariance matrix” [“ESG and Reliability?” George]. Send me some field reliability ships and returns counts data and you’ll see.

References

Aalen, O. O., “Nonparametric Inference for a Family of Counting Processes,” Annals of Statistics, Vol. 6, 701726, 1978

Odd O Aalen, Ørnulf Borgan, and Håkon K. Gjessing, Survival and Event History Analysis, A Process Point of View, Springer, 2008

Leyla Azarang, Jacobo de Uña-Álvarez and Winfried Stute, “The Jackknife Estimate of Covariance of Two Kaplan–Meier Integrals with Covariables,” Statistics, Vol. 49, No. 5, pp. 1005-1025,DOI:10.1080/02331888.2014.960871, 2015

D. A. Freedman, “Greenwood’s Formula,” https://www.stat.berkeley.edu/~freedman/greenwd.pdf

Larry George, “ESG and Reliability?” may appear in “Weekly Update,” www.accendoreliability.com, 2023

Greenwood, M., “The natural duration of cancer. Reports on Public Health and Medical Subjects,” Vol. 33, pp. 1–26, His Majesty’s Stationery Office, London, 1926

W. J. Hall and Jon A. Wellner , “Confidence Bands for a Survival Curve from Censored Data,” Biometrika, Vol. 67, No. 1, pp. 133-143, April 1980

Habib Nawaz Khan, Qamruz Zaman, Fatima Azmi, , Gulap Shahzada, and Mihajlo Jakovljevic, “Methods for Improving the Variance Estimator of the Kaplan–Meier Survival Function, When There Is No, Moderate and Heavy Censoring-Applied in Oncological Datasets,” Frontiers in Public Health, May 2022

James McLinn, “Weibull Analysis of Perplexing Field Data,” ARSymposium, 2010

Wayne Nelson, “Theory and Applications of Hazard Plotting for Censored Failure Data,” Technometrics, Vol. 42, No. 1, February 2000

S. Sawyer, “The Greenwood and Exponential Greenwood Confidence Intervals in Survival Analysis,” September 4, 2003

Paul Tune, “Computing Constrained Cramer-Rao Bounds,” IEEE Transactions on Signal Processing, Vol. 60, No. 10, pp. 5543-5548, doi: 10.1109/TSP.2012.2204258, Oct. 2012

Bhanukiran Vinzamuri, Yan Li, and Chandan K. Reddy, “Calibrated Survival Analysis using Regularized Inverse Covariance Estimation for Right Censored Data,” IEEE Transactions on Knowledge and Data Engineering, DOI:10.1109/TKDE.2017.2719028, June 2007

Jon A. Wellner, “Notes on Greenwood’s Variance Estimator for the Kaplan-Meier Estimator,” Univ. of Washington, January, 2010

Comments

Larry George says
March 11, 2023 at 1:01 PM
Stupid me. I computed the Greenwood standard deviation wrong in Table 2. Kaplan-Meier reliability differs from cohort reliabilities too. Cohor reliabilities do not vary.
Age KM Rel Greenwood Cohort Reliabilities
1 0.95 0.0126 0.95
2 0.85 0.024 0.8947
3 0.70 0.403 0.8235
Greenwood standard deviations should have been
1 0.0689
2 0.11091
3 0.11878

Greenwood Variance vs. Empirical Variance?

So What?

Recommendations?

References

About Larry George

Comments

Leave a Reply Cancel reply