Kaplan-Meier Estimator for Renewal Processes?

The New-Products manager asked me, “Your actuarial failure rate estimates (from vehicle registrations, bills-of-materials, and automotive aftermarket store sales) are for dead-forever parts with at most one failure. What if auto parts could be renewed or replaced more than once?” Chagrined, I wrote a spreadsheet program to estimate actuarial rates for renewal processes, without life data. But what is the corresponding estimator from grouped, cohort renewal counts like the Kaplan-Meier estimator for grouped, cohort failure counts?

The Kaplan-Meier estimator is for dead-forever, failure counts!

The Kaplan-Meier reliability estimator is well known and widely taught [Kaplan and Meier]. It’s in reliability books [Gertsbakh chapter 5 and Aalen et al. chapter 3] and software [SAS, JMP, XLSTAT, R, SPSS,…] It’s the nonparametric maximum likelihood estimator (npmle) of the reliability function from grouped ages at first failures and survivors’ ages (right-censored ages at first failures). The Nelson-Aalen estimator is the npmle of the cumulative hazard function, A(t) (sum of actuarial rates up to age t) from the same data [Aalen et al.]. Those estimators are NOT FOR RENEWAL COUNTS!. Thanks to Wayne Nelson for asking me about reliability estimation from renewal process counts. Plugging grouped renewal counts into the Kaplan-Meier estimator does not estimate reliability.

Table 1 shows simulated grouped, censored failure counts. It is so-called a “Nevada” table because the period-grouped failure counts look like a mirror image of Nevada on its side. Notice there were more failures than ships? Cohorts “Ships” are in periods 1-6 (column 2) and failure counts in periods 1-6 are in rows 2-7. Where did they all come from? Renewal processes.

Table 1. Simulated input data for Kaplan-Meier estimator. “Ships” cohorts are in column 2 and body of the table contains grouped failure counts from each cohort in each age interval 1-6.

Period	Ships	1	2	3	4	5	6	Total
1	6	10	2	3	4	1	6	26
2	6		12	0	2	2	2	18
3	6			5	1	0	0	6
4	6				11	4	2	17
5	6					9	1	10
6	6						1	1
Sums	36	10	14	8	18	16	12

For dead-forever products or parts, the Kaplan-Meier estimator from grouped failure counts by ships cohort has less uncertainty than estimators from ships and returns counts [George, July 2017]. But the Kaplan-Meier estimator is not a reliability estimator for renewal process ships cohorts and grouped renewal counts! There is no reliability estimator from grouped by cohort renewal counts in a Nevada table.

The Kaplan-Meier estimator from grouped renewal counts produces an estimate of 1‑F*(t), the convolution an unknown number of lifetimes of the underlying reliability function 1-F(t). F*(t) could be useful for estimating the renewal function m(t) = E[N(t)] (N(t) is the number of failures up to time t.).

The table 1 “Ships” cohorts and the “Sums” bottom row contain statistically sufficient data to estimate nonparametric reliability functions, even for renewal processes [George, 2021]. Why not use the least-squares estimator from ships and returns counts [George, 2017]? Some question the principle of maximum likelihood [Berkson] compared with minimum chi-square estimation (least squares), especially when the cost of error is proportional to the square of the error (variance). The asymptotic (Greenwood) variance-covariance of the Kaplan-Meier estimator fails for finite sample sizes [George, March 2023].

Alternative Grouped Renewal Counts by Cohort?

What if grouped failure counts were renewals or recurrent event counts? What if you didn’t know how many prior renewals of each cohort member had occurred? What if you didn’t know how many previous failures or recurrent events had previously occurred for each product or part? Maybe:

One day they dump all the current period failed items on your desk and asked for their reliability? Their serial numbers identify which cohort they came from but there is no record of whether this period’s failed item is the first, second, or ??? renewal.
Every period they record all the failed items in each age interval by cohort, dump them on your desk, and ask for their reliability?

In the case 1, you could fill out the Nevada table for the current period but would have no idea how many previous failures there had been in previous periods (aka “current status data” [George, April 2021]. In the second case, the period failure counts probably included all renewals. If you’re lucky, renewals indicated which cohort they came from and you have lifetime data for each reported failure. I.e., if a cohort 1 part fails at age 6, its failure (renewal) is recorded as a member of cohort 1 in period 6. You have grouped lifetime data and can use well-known reliability estimators for renewal processes [Guédon and Cocozza-Thivent, Lin (chapter 3), Rice et al., Vardi].

Tables 1 and 2 are an example of the Nevada tables for data of type case 2.

Table 2. TTFF, TTFF+TBF1, TTFF+TBF1+TBF2, etc. for simulated Weibull Exp[-(t/eta)^beta] renewal processes [Yannaros] with eta=1, beta=0.5, mean = 20, for six cohorts of size 6 in 6 successive periods. These data were used to produce Nevada table 1.

Unit\Failure	1	2	3	4	5	6
1	0.0	3.0	3.2	15.3	16.1	17.2
1	0.2	0.3	0.9	0.9	1.0	4.6
1	0.0	23.7	23.7	23.9	24.1	24.1
1	0.8	2.4	2.5	3.4	4.3	4.3
1	0.0	0.2	3.6	3.7	3.7	4.5
1	1.1	1.1	1.3	27.6	30.5	31.0
2		4.6	5.5	6.0	6.1	6.1
2		8.5	8.5	8.8	9.0	10.6
2		2.0	2.7	2.9	3.4	3.5
2		0.0	0.1	0.1	3.5	3.5
2		0.9	1.0	1.4	5.6	18.9
2		0.7	8.2	8.3	11.4	11.4
3			2.2	4.2	4.2	4.3
Etc.					0.1	0.3

Renewal Process Forecasting

An actuarial forecast is Σa(t-s)*n(t), s=0,1,2,…,t where n(t) is the installed base of age t and a(t-s) is the actuarial failure rate for a product or part of age t-s. The actuarial forecast is an estimate of the mean demand by some time t. Actuarial forecasts use actuarial rates, discrete, failure probabilities, a(t-s), conditional on survival to age t-s.

The actuarial hindcast for a renewal process is Σd(t-s)*n(t) where d(t-s) is the actuarial demand rate from the renewal function d(t) = M(t)-M(t-1). (Hindcast is a forecast for renewals that have already happened.) To forecast future renewals, extrapolate the demand rate function d(t), (and the installed base n(t) if unknown.)

Estimation of the underlying reliability function

Do you want to estimate the underlying reliability function 1-F(t) or actuarial failure rate function, a(t), for diagnostics, resource allocation to improvements, identifying RCM failure rate function, etc.? You could use the least squares estimator from the installed base ships cohorts and the sums of counts in the bottom row of table 1 [George, July 2021]. That estimator ignores the information in the body of the Nevada table. Why not use the grouped-by-cohort renewal counts from renewal processes if you have been tracking them by cohort (table 1)?

Table 3. Hindcasts Σd(t-s)*n(t) of the renewal counts in table 1.

Period	Ships	1	2	3	4	5	6
1	6	4.56	3.47	2.64	2.00	2.28	2.99
2	6		4.56	3.47	2.64	2.00	2.28
3	6			4.56	3.47	2.64	2.00
4	6				4.56	3.47	2.64
5	6					4.56	3.47
6	6						4.56
Sums	36	4.56	8.03	10.66	12.67	14.95	17.94

The demand rates d(t) for the actuarial hindcasts are computed to minimize the sum of squared differences (SSE) between the entries in Nevada tables 1 and 3 (SUMXMY2() function in Excel). (Alternatively minimize the chi-square objective function

Σ(observed-expected)²/expected.)

Excel Solver minimizes the objectives as functions of the underlying nonparametric distribution of times between renewals. Figure 1 shows the nonparametric least-squares and chi-square reliability estimates and the Weibull function from which the Nevada table 1 data was simulated.

Figure 1. Nonparametric least-squares R(t) SSE, Weibull W(t), and R(t) chiSq reliability functions

Conclusions and Recommendations?

I was surprised to find an apparently unsolved problem in reliability statistics. The likelihood function involves the unknown conditional survival probability of units that didn’t get renewed in each period. It was too difficult for me to derive. So I used least-squares.

The nonparametric least-squares reliability estimate doesn’t reproduce the Weibull reliability function from which the data were simulated, perhaps due to sample size of 36 units. Nevertheless, it does minimize the sum of squared differences between observed renewal counts and hindcasts, which indicates promise for forecasts based on the nonparametric estimate, when costs proportional to squared error. Weibull reliability is a convenient but often unrealistic assumption.

The nonparametric least-squares reliability estimate from ships cohorts and superposed renewal counts (ships and bottom row of table 1) puts all mass at age 1; reliability is 0.2216 for all ages 1,2,…,6. The estimate from grouped renewal counts by cohort (Nevada table) fits Weibull better, because it uses more information [George, 2018]. Is it worthwhile? 1. Lifetime data vs. 2. grouped renewal counts by cohort vs. 3. ships cohorts and superposed renewal counts by period? Depends on data costs; GAAP requires ships cohorts and superposed renewal counts by accounting period.

If you would like the renewal process simulation, the least-squares, or the chi-square reliability estimation workbook, let me know in a comment.

References

Odd O. Aalen, Ørnulf Borgan, and Håkon K. Gjessing, Survival and Event History Analysis…, Springer, 2008

Joseph Berkson, “Minimum Chi-Square, not Maximum Likelihood!.” Ann. Statist., Vol. 8(3), pp. 457-487, https://doi.org/10.1214/aos/1176345003, May, 1980

Ilya B Gertsbakh, Models of Preventive Maintenance, North-Holland, 1977

Yann Guédon and Christiane Cocozza-Thivent, “Nonparametric Estimation of Renewal Processes from Count Data”, Canadian Journal of Statistics, Vol. 31 (2), pp. 191-223. ff10.2307/3316067ff. ffhal-00827464, 2003

E. L. Kaplan and Paul Meier, “Non–Parametric Estimation From Incomplete Data”. Jour. Amer. Statist. Assn., Vol. 53, pp. 457–481, 1958

Binshan Lin, “Estimation of the Renewal Function.” LSU Historical Dissertations and Theses. 4656. https://repository.lsu.edu/gradschool_disstheses/4656, 1988

Wayne Nelson, Applied Life Data Analysis; Wiley, New York, 1982

John D. Rice , Robert L. Strawderman, and Brent A. Johnson, “Regularity of a Renewal Process Estimated from Binary Data”, Biometrics, Vol. 74(2), pp. 566–574, doi:10.1111/biom.12768, June 2018

Y. Vardi, “Nonparametric Estimation in Renewal Processes”, The Annals of Statistics, Vol. 16, No. 3, pp. 772-785, 1982

Nikos Yannaros, “Weibull Renewal Processes”, Ann. Inst. Statist., Math., Vol. 46, No. 4, pp. 641-648, 1994

References by George

Random-Tandem Queues and Reliability Estimation Without Life Data,https://sites.google.com/site/fieldreliability/random-tandem-queues-and-reliability-estimation-without-life-data/, Dec. 2018

“Reliability from Current Status Data,” Weekly Update, Reliability from Current Status Data – Accendo Reliability/ April 2021

“Actuarial Forecasts, Least Squares Reliability, and Martingales,” Weekly Update, Actuarial Forecasts, Least Squares Reliability, and Martingales (accendoreliability.com)/ June 2021

“Renewal Process Estimation, Without Life Data”, Weekly Update, https://accendoreliability.com/renewal-process-estimation-without-life-data/#more-443057/, July 2021

“Covariance of the Kaplan-Meier Estimators?” Weekly Update, Covariance of the Kaplan-Meier Estimators? – Accendo Reliability, March 2023