“It is the policy of my Administration to respond to the coronavirus disease 2019 (COVID-19) pandemic through effective approaches guided by the best available science and data” [Biden Executive order, 2021]. That epidemic inspired the simultaneous nonparametric estimation of survival functions from case to recovery and case to death, without lifetime data (figure 1)!
Why not do the same for multiple-failure-mode data? This article shows nonparametric, multiple-failure-mode, maximum likelihood reliability estimation in a spreadsheet. Data are system first-failure times and the corresponding failure modes that caused the first system failures (table 1). However those data are dependent. I will explain the likelihood function, lnL, and how to find the maximum likelihood reliability estimates for all failure modes simultaneously.
Some articles and statistical software use Kaplan-Meier estimators for each failure mode [Dignam et al., ReliaSoft-ReliaWiki, Minitab, Others?]. Unfortunately, the Kaplan-Meier estimator is for independent, identically distributed, perhaps censored data [Mailman School of Public Health, Prentice et al., Yizeng He et al., Thernau et al. 2024, Lin et al.]. Using Kaplan-Meier results in reliability estimates that are biased low.
Simultaneous estimates could be produced by constrained least squares; e.g., COVID-19 “survival” function estimation in two modes: recovery and death, from periodic case, recovery, and death counts (figure 1) [George 2021]. Why not maximize the multiple-mode likelihood function constrained by the percentages of failures in each mode [Prentice et al., George 2025]? Table 1 shows some multiple-mode, first-failure time data for example.
Table 1. Multiple-Mode, first-failure-time data (first three columns) [Rao, Meeker et al.] and likelihoods
Kilometers | Fail Mode | Censored? | L=Likelihood | lnL |
6700 | Mode1 | Failed | 0.01681 | -4.0857 |
6950 | Censored | Censored | 0.98319 | -0.017 |
7820 | Censored | Censored | 0.98319 | -0.017 |
8790 | Censored | Censored | 0.98319 | -0.017 |
9120 | Mode2 | Failed | 0.01836 | -3.9978 |
9660 | Censored | Censored | 0.96483 | -0.0358 |
9820 | Censored | Censored | 0.96483 | -0.0358 |
11310 | Censored | Censored | 0.96483 | -0.0358 |
11690 | Censored | Censored | 0.96483 | -0.0358 |
11850 | Censored | Censored | 0.96483 | -0.0358 |
11880 | Censored | Censored | 0.96483 | -0.0358 |
12140 | Censored | Censored | 0.96483 | -0.0358 |
12200 | Mode1 | Failed | 0.02032 | -3.8961 |
12870 | Censored | Censored | 0.94451 | -0.0571 |
13150 | Mode2 | Failed | 0.02168 | -3.8312 |
13330 | Censored | Censored | 0.92283 | -0.0803 |
13470 | Censored | Censored | 0.92283 | -0.0803 |
14040 | Censored | Censored | 0.92283 | -0.0803 |
14300 | Mode1 | Failed | 0.02225 | -3.8056 |
17520 | Mode1 | Failed | 0.02225 | -3.8056 |
17540 | Censored | Censored | 0.87833 | -0.1297 |
17890 | Censored | Censored | 0.87833 | -0.1297 |
18450 | Censored | Censored | 0.87833 | -0.1297 |
18960 | Censored | Censored | 0.87833 | -0.1297 |
18980 | Censored | Censored | 0.87833 | -0.1297 |
19410 | Censored | Censored | 0.87833 | -0.1297 |
20100 | Mode2 | Failed | 0.02783 | -3.5817 |
20100 | Censored | Censored | 0.85051 | -0.1619 |
20150 | Censored | Censored | 0.85051 | -0.1619 |
20320 | Censored | Censored | 0.85051 | -0.1619 |
20900 | Mode2 | Failed | 0.03086 | -3.4783 |
22700 | Mode1 | Failed | 0.02878 | -3.5482 |
23490 | Censored | Censored | 0.79087 | -0.2346 |
26510 | Mode1 | Failed | 0.02987 | -3.5107 |
27410 | Censored | Censored | 0.761 | -0.2731 |
27490 | Mode1 | Failed | 0.03108 | -3.4712 |
27890 | Censored | Censored | 0.72992 | -0.3148 |
lnL-> | -44.013 |
Spreadsheet Implementation
You could copy this article’s tables into your own spreadsheet, or you could ask for my spreadsheet with cell formulas. Some work is required to reproduce the formulas that generate the numbers in tables 1-4. I hope the following explanation explains how to do it.
The dependence among failure modes is represented by the proportions of failures in the 38 independent random samples in which each failure mode is the first failure, table 2. Kaplan-Meier estimates by failure mode do not yield these proportions. Some people have tried to use the failure mode as a factor in Kaplan-Meier, proportional hazards, reliability estimates [Agrawal et al.], using statistical software [e.g., Thernau “Survival” R-package].
Table 2. Failure proportions and counts out of 38 in table 1: 27 were both censored
Mode | Proportion |
Mode1 | 0.18421 |
Mode2 | 0.10526 |
Failure Counts | |
Mode1 | 7 |
Mode2 | 4 |
Total | 38 |
For each observation time t, the likelihoods in table 1 consists of terms such as p(t;mode)*(1-F(t;other mode) if failure at age t or (1-F(t;mode)*(1-F(t;other mode). The p(t;.) is the discrete probability of failure at age t, and F(t;.) is the discrete cumulative distribution function of time or kilometers to failure. Reliability is (1-F(t;mode)). Likelihood L is the product of all of the terms. The product could cause underflow, so maximize the sum of the logarithms of the likelihoods. For more detailed explanation about inappropriate and appropriate likelihood function(s) for multiple mode failure data, please refer to https://accendoreliability.com/statistical-software-problem/#more-582018/.
Table 3 contains estimates of p(t;mode), and table 4 contains the corresponding reliability estimates R(t;mode)=1-F(t;mode) computed from table 3. Tables 3 and 4 are used in table 1 to construct the likelihoods row by row, manually, using either p(t;mode)*(1-F(t;other mode) if failure at age t or 1-F(t;mode1)*(1-F(t; mode2) depending on censoring. Nested IF(IF()) statements could do the same, less clearly. Table 3 could be copied anywhere convenient in a spreadsheet. Table 4 should be copied so its rows line up with table 1 for convenience.
Table 3. Discrete probabilities of failures at ages (km) t: p(t;mode). He “KM p-h ” p(t;mode) estimates are derived from the Kaplan-Meier proportional hazards (modes) estimates
Km | p(t;mode1) | p(t;mode2) | KM p-h 1 | KM p-h 2 |
6700 | 0.01681 | 0 | 0.0263 | |
9120 | 0 | 0.01867 | 0.0286 | |
12200 | 0.02071 | 0 | 0.0364 | |
13150 | 0 | 0.02253 | 0.0379 | |
14300 | 0.0232 | 0 | 0.0435 | |
17520 | 0.0232 | 0 | 0.0435 | |
20100 | 0 | 0.03038 | 0.0653 | |
20900 | 0 | 0.03369 | 0.0898 | |
22700 | 0.03216 | 0 | 0.0899 | |
26510 | 0.03339 | 0 | 0.1077 | |
27490 | 0.03474 | 0 | 0.1437 | |
Sums | 0.18421 | 0.10526 | 0.491 | 0.2216 |
Proportions | 0.18421 | 0.10526 |
Table 4 is built from table 3 p(t;mode) as R(t;mode)=1-∑p(s;mode), s=1,2,…,t, at failure times t in each mode. F(t;mode) is constant in between modal failure times. KM1 and KM2 are the Kaplan-Meier reliability estimates for each failure mode. Notice that reliability R(28100 km;mode1)=0.81579=1-0.18421 and R(28100 km;mode2)=0.89474=1-0.10526 agrees with the failure proportions in table 2. The Kaplan-Meier estimates KM1 and KM2, at 28100 km disagree.
Table 4. R(t;mode)=1-F(t;mode) estimates, from the sums, F(t;mode)=∑p(s;mode) s=1,2,…,t.
Km | R(t;Mode1) | R(t;Mode2) | KM1 | KM2 |
0 | 1 | 1 | 1 | 1 |
6700 | 0.98319 | 1 | 0.97368 | 1 |
6950 | 0.98319 | 1 | 0.97368 | 1 |
7820 | 0.98319 | 1 | 0.97368 | 1 |
8790 | 0.98319 | 1 | 0.97368 | 1 |
9120 | 0.98319 | 0.98133 | 0.97368 | 0.97059 |
9660 | 0.98319 | 0.98133 | 0.97368 | 0.97059 |
9820 | 0.98319 | 0.98133 | 0.97368 | 0.97059 |
11310 | 0.98319 | 0.98133 | 0.97368 | 0.97059 |
11690 | 0.98319 | 0.98133 | 0.97368 | 0.97059 |
11850 | 0.98319 | 0.98133 | 0.97368 | 0.97059 |
11880 | 0.98319 | 0.98133 | 0.97368 | 0.97059 |
12140 | 0.98319 | 0.98133 | 0.97368 | 0.97059 |
12200 | 0.96248 | 0.98133 | 0.93623 | 0.97059 |
12870 | 0.96248 | 0.98133 | 0.93623 | 0.97059 |
13150 | 0.96248 | 0.9588 | 0.93623 | 0.93015 |
13330 | 0.96248 | 0.9588 | 0.93623 | 0.93015 |
13470 | 0.96248 | 0.9588 | 0.93623 | 0.93015 |
14040 | 0.96248 | 0.9588 | 0.93623 | 0.93015 |
14300 | 0.93928 | 0.9588 | 0.88942 | 0.93015 |
17520 | 0.91608 | 0.9588 | 0.84261 | 0.93015 |
17540 | 0.91608 | 0.9588 | 0.84261 | 0.93015 |
17890 | 0.91608 | 0.9588 | 0.84261 | 0.93015 |
18450 | 0.91608 | 0.9588 | 0.84261 | 0.93015 |
18960 | 0.91608 | 0.9588 | 0.84261 | 0.93015 |
18980 | 0.91608 | 0.9588 | 0.84261 | 0.93015 |
19410 | 0.91608 | 0.9588 | 0.84261 | 0.93015 |
20100 | 0.91608 | 0.92842 | 0.84261 | 0.85263 |
20100 | 0.91608 | 0.92842 | 0.84261 | 0.85263 |
20150 | 0.91608 | 0.92842 | 0.84261 | 0.85263 |
20320 | 0.91608 | 0.92842 | 0.84261 | 0.85263 |
20900 | 0.91608 | 0.89474 | 0.84261 | 0.74606 |
22700 | 0.88391 | 0.89474 | 0.72224 | 0.74606 |
23490 | 0.88391 | 0.89474 | 0.72224 | 0.74606 |
26510 | 0.85053 | 0.89474 | 0.57779 | 0.74606 |
27410 | 0.85053 | 0.89474 | 0.57779 | 0.74606 |
27490 | 0.81579 | 0.89474 | 0.38519 | 0.74606 |
27890 | 0.81579 | 0.89474 | 0.38519 | 0.74606 |
28100 | 0.81579 | 0.89474 | 0.38519 | 0.74606 |
Figure 2 shows the bias in the Kaplan-Meier reliability proportional-hazards estimates (KMPH) compared with the maximum likelihood estimates. The lines are the maximum likelihood reliability estimates, and the dots are the Kaplan-Meier proportional hazards estimates.
Figure 3 shows the nonparametric maximum likelihood discrete failure rate function estimates, at times (kilometers) of failures by failure mode. The proportional hazards assumptions seems reasonable, even if the resulting reliability estimates In Figure 1 are biased. That bias is because independent, Kaplan-Meier, proportional hazards estimates do not account for the dependence in proportions of modal failures.
Maximum Likelihood estimation
Use Excel Solver to maximize lnL (below table 1) as a function of the p(t;mode) values in table 3. Set a constraint that the sums of failure mode probabilities,∑p(s;mode), equal the proportions in table 2. I tried doing maximization on table 4 directly, but Excel complained about too many variables.
Recommendations
Don’t use the Kaplan-Meier reliability function estimates for multiple-mode first-failure times, because they don’t account for observed proportional dependence among the failure modes. They are biased low because, for the older failure times, each Kaplan-Meier reliability estimate are multiplied by (1-1/survivors)->0 as the number of survivors decreases.
Doubt statistical software the allows specifying parametric distributions for each failure mode. They may use parameter estimates in each failure mode separately without accounting for dependence. I failed to make the maximum likelihood estimation work for Weibull reliability functions on another data set with the proportionality constraint.
One paper describes using a bivariate Weibull distribution estimate, analogous to the Marshall-Olkin bivariate exponential distribution. That accounts for simultaneous failures, if any [Agrawal et al.].
Why not do multiple-mode reliability estimation, without lifetime data? Joe Biden says so [Executive Order]! Even with multiple-mode data? Generally Accepted Accounting Principles requires statistically sufficient data, and it’s population data! It is reasonable to count failures by failure mode; e.g., product sales, BoMs, and simultaneous spare parts’ sales or demands; Corona Virus case, recovery, and death counts [George 2025].
References
Aakash Agrawal, Debanjan Mitra, and Ayon Ganguly, “A Model for Censored Reliability Data with Two Dependent Failure Modes and Prediction of Future Failures,” arXiv:2206.12892v1 [stat.ME], 26 Jun 2022
Joseph Biden, “Executive Order on Ensuring a Data-Driven Response to COVID-19 and Future High-Consequence Public Health Threats,” Jan. 21, 2021
James J Dignam, Qiang Zhang, Maria N, Kocherginsky, “The Use and Interpretation of Competing Risks Regression Models,” Clin Cancer Res.;18(8):2301–2308. doi: 10.1158/1078-0432.CCR-11-2097, Jan. 2012
L. L. George, “COVID-19 Survival Analysis,” https://sites.google.com/site/fieldreliability/corona-virus-survival-analysis/ Jan. 2021
L. L. George, ”Statistical Software Problem,” https://accendoreliability.com/statistical-software-problem/#more-582018/, Jan. 2025
Yizeng He, Kwang Woo Ahn , Ruta Brazauskas, “A Review of Competing Risks Data Analysis,” https://www.mcw.edu/-/media/MCW/Departments/Biostatistics/TR070.pdf/
Guixian Lin, Ying So, Gordon Johnston, “Analyzing Survival Data with Competing Risks Using SAS® Software,” SAS Global Forum 2012
Meeker, William Q. and Luis A. Escobar, “Degradation Data, Models, and Data Analysis,” 2003, http://www.public.iastate.edu/~wqmeeker/stat533stuff/psnups/chapter13_psnup.pdf
E. L. Kaplan and P. Meier, ”Nonparametric Estimator From Incomplete Observations,” J. Amer. Statist. Assn., Vol. 53, pp. 457-481, 1958
Mailman School of Public Health, “Competing Risk Analysis,” Columbia U. Irving Medical Center, https://www.publichealth.columbia.edu/research/population-health-methods/competing-risk-analysis#/
Ross L. Prentice, J. D. Kalbfleisch, A. V. Peterson, N. Flournoy, V. T. Farewell, and N. Breslow, “The Analysis of Failure Time Data in the Presence of Competing Risks,” Biometrics, Vol. 34, pp. 541-554, 1978
Ross L. Prentice and J. D. Kalbfleisch, “Hazard Rate Models with Covariates,” Biometrics, Vol. 34, No. 1, Perspectives in Biometry, pp. 25-39, March 1979
Reliasoft-Reliawiki, “Competing Failure Modes (CFM) Analysis,” https://www.reliawiki.com/index.php/Competing_Failure_Modes_Analysis/, Sept. 2023
Terry Thernau, “Package ‘survival,’” https://github.com/therneau/survival/, Dec. 2024
Terry Therneau, Cynthia Crowson, and Elizabeth Atkinson, “Multi-state models and competing risks,” https://cran.r-project.org/web/packages/survival/vignettes/compete.pdf/, December 17, 2024
Leave a Reply