Transient Markov Model of Multiple Failure Modes

Last Verified September 3, 2022

COVID-19 Case Fatality Rate (CFR) is easy to estimate: CFR=deaths/cases. Regression forecasts of COVID-19 cases and deaths are easy but complicated by variants and nonlinearity. Epidemiologists use SIR models (Susceptible, Infectious, and “Removed”) to estimate Ro. These are baseball statistics. Reliability people need age-specific reliability and failure-rate function estimates, by failure mode, to diagnose problems, recommend spares, plan maintenance, do risk analyses, etc. Markov models use actuarial transition rates.

Transient Markov process models use age-specific (actuarial) failure rate estimates, by failure mode, if there are several failure modes. Reliability people should use age-specific failure rate estimates to estimate age-specific availability, mission reliability, and to recommend spares, plan maintenance and do risk analyses. Epidemiologists should use them to estimate actuarial death and recovery rates, as “survival” functions of time since infection, to estimate Ro, to make forecasts, and to evaluate treatments and interventions.

Epidemiologists characterize epidemics with the basic reproduction number Ro, the expected number of infections resulting from a single case. SIR epidemic models (Susceptible, Infectious, and “Removed” [recovered or dead]) are used to make forecasts and estimate Ro, the expected number of infections resulting from a single case, Ro = b/g. SIR models also represent computer-virus (in)security. SIR epidemic models describe the evolution over time of a population S(t)+I(t)+R(t) of S(t) susceptible, ill or infectious I(t), and removed R(t), with constant transition rates between states: susceptible->infectious (rate b) and infectious->removed (rate g), dead or recovered and immune. SIR models show the effects of vaccination and other infection controls. [Yaesoubi and Cohen, Romeu]

SIR models describe the epidemic with three differential equations for the numbers in each state: S(t), I(t), and R(t). The solutions to the differential equations have been obtained. [Kawai and others] [https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#The_SIR_model]

Three SIR differential equations describe a continuous-time Markov process, because the constant transition rates imply the memoryless property of exponential distributions. It is easy to approximate continuous time Markov processes with Markov chains and vice-versa. However, solving differential equation models is not easy [George 1973], even if transition rates are constant. Transient Markov chains can represent or approximate processes with age-dependent transition rates, such as COVID-19 or systems or parts with age-specific failure rates.

Periodic case, recovery, and death counts are statistically sufficient to make nonparametric estimates of joint survival functions and actuarial death and recovery transition rates. Actuarial death or recovery rates are analogous to age-specific failure rates of systems with more than one failure mode. This is known in reliability as a “competing-risk” model, https://www.itl.nist.gov/div898/handbook/apr/section1/apr181.htm. The NIST article assumes independent failure modes, so that failure-mode rates could be added to estimate system failure rates, as in MTBF prediction.

Periodic ships (installed base), and failure counts by failure mode are statistically sufficient to make nonparametric estimates of joint reliability and actuarial failure rate functions. This article shows how to simultaneously estimate competing mode actuarial death and recovery rates, without life data and without assuming independence. You could apply these estimates to a transient Markov chain approximation of SIR model of multiple, competing-risk failure modes, for estimating age-dependent availability, mission reliability, and plan maintenance. This article also shows that the competing risks of COVID-19 death or recovery are not independent. Do your products’ parts’ have independent competing-risk failure modes?

Simultaneous Estimation of Survival Functions Conditional on Failure Modes

The SIR model could describe counts as nonstationary Poisson cases (infections) split proportional to CFR (Case Fatality Rate = deaths/cases) into two conditional M(t)/G/Infinity self-service systems with service time distribution G(t). It is difficult to challenge the assumption of nonstationary Poisson infections from case, infections, or cohort counts, because each cohort is a sample of size one. The outputs of M(t)/G/Infinity self-service systems are Poisson [George and Agrawal] even if split into several failure modes. (G stands for the service time distributions G(t|dead) or G(t|recovery).

Output(t)|death ~Poisson[l(t; Dead)*G(t; Dead)]
Output(t)|recovered ~Poisson[l(t; Recover)*G(t; Recover)]
l(t; Dead)+l(t; Recovered) = l(t) cohort infected case counts
and l(t; Dead)/l(t) = CFR

Simultaneously estimate CFR and the service-time distributions G(t|Dead) and G(t|Recover) to maximize log likelihood of the product PPoisson[l(t)*CFR*G(t; Dead);D(t)]*PPoisson[l(t)*(1-CFR)*G(t; Recover); R(t)], where D(t) is the number of deaths, R(t) is the number of recoveries in time interval t, and the l(t) are the observed cohort infected case counts; t=1,2,…,end of data. The least-squares estimates minimize the sums of squared errors between observed deaths and recoveries and actuarial forecasts (hindcasts) as functions of the joint survival functions for each failure mode.

Corona virus has two “failure” modes: death or recovery. Table 1 shows typical epidemic data: periodic, cumulative case, death, and recovery counts. The data are from Democratic Republic of Congo in 2020. These data are statistically sufficient to make simultaneous nonparametric estimates of “survival” functions, conditional on death or recovery. The simultaneous survival function estimates are not the same as independent survival function estimates!

Table 1. Some DR Congo cumulative case, death and recovery counts

Week starting	Cases	Deaths	Recovered
3/11/2020	5	0	0
3/18/2020	11	0	0
3/25/2020	45	2	0
4/1/2020	109	8	0
4/8/2020	183	20	10
4/15/2020	254	21	11
4/22/2020	359	25	24

I use both least squares and maximum likelihood to simultaneously estimate survival functions and actuarial recovery and death rates conditional on death or recovery, from case, death, and recovery counts. The corresponding conditional actuarial rates (conditional on time and on failure mode) should be plugged into Markov chain SIR transition matrixes [McKinley et al.], because independently estimated survival function are NOT the same as the simultaneously estimated survival functions! [Huber and Pons]

Tables 2 shows the results of alternative estimation assumptions: independent (NIST) vs. dependent. It shows that dependence (columns 2 and 3) matters. The dependent “Recovery” survival function estimate captures information from later months that is not evident in the independent survival function estimate in column 5, which is influenced by 0 recoveries reported in March 2020. That may have been reporting error.

Table 2. Simultaneous “survival” (reliability) function maximum likelihood estimates conditional on death and recovery compared with independent maximum likelihood “survival” function estimates, “Dead (indep)” and “Recovery (indep)”. The second “Recovery” and third “Dead” columns are the simultaneous, conditional survival function estimates. I don’t trust the “Recovery (indep)” estimate because there were no deaths reported in March 2020.

Weeks	Recovery	Dead	Dead (indep.)	Recovery (indep)
0	1.0000	1.0000	1.0000	1.0000
1	0.1168	0.4512	1.0000	1.0000
2	0.1168	0.1764	1.0000	1.0000
3	0.1168	0.1293	0.2495	1.0000
4	0.0377	0.0914	0.2495	1.0000
5	0.0377	0.0625	0.2495	0.86871
6	0.0377	0.0418	0.2495	0.86871
7	0.0377	0.0280	0.2495	0.86871
8	0.0377	0.0214	0.2495	0.86871

The “survival” function and actuarial transition rates estimated simultaneously from observed case, recovery and death counts are dependent, because they differ from the independently estimated survival functions. This requires alternative methods to represent dependence [Lagerås]. Not only are actuarial transition rate estimates dependent, using case, recovery, and death counts and other factors to estimate transition actuarial rates captures dependence on other time related factors in hospital infections [Cooper and Lipsitch].

Tables 3 and figure 1 show the results of alternative estimation methods: least squares vs. maximum likelihood. The results of the methods differ, and I don’t have explanation. Your choice should depend on what you want to use the estimates for and on the uncertainties induced by the methods on your results.

Table 3. Simultaneous “survival” (reliability) function least squares (nplse) and maximum likelihood (npmle) estimates conditional on death and recovery.

Weeks	Recovery nplse	Dead nplse	Recovery npmle	Dead npmle
0	1.0000	1.0000	1.0000	1.0000
1	1.0000	1.0000	0.4512	0.1168
2	0.5183	0.4478	0.1764	0.1168
3	0.5183	0.4478	0.1293	0.1168
4	0.5183	0.0000	0.0914	0.0377
5	0.0000	0.0000	0.0625	0.0377
6	0.0000	0.0000	0.0418	0.0377
7	0.0000	0.0000	0.0280	0.0377
8	0.0000	0.0000	0.0214	0.0377

Figure 1. Simultaneous conditional survival function estimates by least squares (left two lines) and maximum likelihood (right two lines).

Figure 1 is busy but it shows disagreement of simultaneous nonparametric least squares (nplse) and maximum likelihood (npmle) estimates. Compare the red and orange survival function estimates, S(t|recovered) = 1‑G(t|recovered); the red nplse indicates longer time to recovery than the orange npmle. Compare S(t|Dead) = 1‑G(t|Dead); the dark blue nplse crosses the light blue npmle. Furthermore, neither of the simultaneous S(t|Dead) estimates agrees with the marginal grey line S(t|Dead) maximum likelihood estimate, although the simultaneous and marginal maximum likelihood S(t|Dead) estimate are proportional, as they should be.

Figures 2 and 3 compare alternative estimates. Figure 2 compares the “survival” function estimates of time from case to death for simultaneous estimates of survival functions conditional on death by least squares and maximum likelihood vs. the independent survival function maximum likelihood estimate from case and death counts alone. The differences are due to assuming independence of failure modes. Figure 3 compares joint survival function estimates by least squares and maximum likelihood. The differences are due to methods.

Figure 2 shows the standardized joint and independent “survival” (reliability) function estimates for time from case to death.

Figure 3 shows the differences between the standardized least squares (nplse) and maximum likelihood joint survival function estimates.

In a future article on uncertainty, I will try to deal with the estimation-method-induced uncertainty. Which method to use depends on application of results. For example, using the simultaneous least squares actuarial-rate estimates in transient Markov forecasts. Using maximum likelihood actuarial rate estimates in transient Markov chains means that forecasts will also be maximum likelihood forecasts.

Transient Markov Model of Competing-Risks

There are several R-program packages for Markov chains. [Ferguson et al., Araújo et al.] They do not explicitly deal with age-dependent transition probabilities. Your Markov models should include actuarial transition rates, because products have infant mortality, wearout, and attrition or retirement. Imbedded software may be updated. Other events also change failure rates and Markov transition rates. Transient Markov processes with age-dependent transition rates expand the state space and call for tedious computations, especially if there are multiple failure modes. There are alternatives. [Bucholz and Sanders, Marquez et al.] I am working on them.

Previous article, https://accendoreliability.com/markov-approximation-to-standby-system-reliability/, shows how to expand a simple availability Markov chain into a transient Markov chain incorporating age-dependent (actuarial) transition (failure) rate estimates. If there is more than one failure mode, then the (actuarial) transition (failure) rates, by failure mode, should be simultaneously estimated and incorporated into a transient Markov chain.

Suggestions?

This article shows how to estimate actuarial recovery and death rates simultaneously from case, recovery, and death counts. Estimate P[Time to death > s|Death] and P[Time to recovery > t|Recover] simultaneously. (The estimates may be dependent, even if the random variables Time-to-Death and Time-to Recovery are conditionally independent.}

Check assumptions. Are failure modes statistically independent? Are failure rates by failure mode dependent on age? Does your Markov availability model deal with age-dependent transition rates? Does your transient Markov availability represent dependent actuarial transient rate estimates? Reality isn’t easy to represent.

References

E. Almaraz and A. Gómez-Corral, “Markov-modulated interactions in SIR epidemics,” The 5-th Workshop Quantum Days in Bilbao, July 2015

Artur Araújo, Luís Meira-Machado, Javier Roca-Pardiňas, “TPmsm: Estimation of the Transition Probabilities in 3-State Models,” Journal of Statistical Software, Volume 62, Issue 4, December 2014

Peter Buchholz and William H. Sanders, “Approximate Computation of Transient Results for Large Markov Chains,” First International Conference on the Quantitative Evaluation of Systems, QEST 2004. Proceedings, pp. 126-135, doi: 10.1109/QEST.2004.1348027

Ben Cooper and Marc Lipsitch, “The Analysis of Hospital Infection Data Using Hidden Markov Models,” Biostatistics, 5, 2, pp. 223–237, 2004

Nicole Ferguson, Somnath Datta, Guy Brock “msSurv: An R Package for Nonparametric Estimation of Multistate Models,” Journal of Statistical Software, Volume 50, Issue 14, September 2012

George, L. L. “Diffusion Approximation to Two Channel, Poisson-Exponential Service Systems with Dependence,,” PhD Thesis, UC Berkeley, 1973

George, L. L. and Avinash Agrawal “Estimation of a Hidden Service Distribution of an M/G/¥ Service System,” Naval Research Logistics Quarterly, pp. 549-555, Vol. 20, No. 3, September 1973

Catherine Huber and Odile Pons, “Independent Competing Risks Versus a General Semi-Markov Model. Application to Heart Transplant Data,” Lifetime Data Analysis, 2004

Reiichiro Kawai, “Nonnegative Compartment Dynamical System Modelling with Stochastic Differential Equations,” Applied Mathematical Modelling, Volume 36, Issue 12, pp. 6291-6300, December 2012

David Marquez, Martin Neil, and Norman Fenton, “Improved Reliability modelling using Bayesian Networks and dynamic discretization,” Reliability Engineering & System Safety, Volume 95, Issue 4, pp. 412-425, April 2010

Trevelyan McKinley∗ Alex R. Cook , “Inference in Epidemic Models without Likelihoods,” The International Journal of Biostatistics, Volume 5, Issue 1 Article 24, 2009

Jorge Luis Romeu, “A Markov Model to Assess Covid-19 Vaccine Herd Immunization Patterns,” https://www.researchgate.net/profile/Jorge_Romeu, December 17, 2020

Reza Yaesoubi and Ted Cohen “Generalized Markov Models of Infectious Disease Spread: A Novel Framework for Developing Dynamic Health Policies,” Eur J Oper Res., 215(3): pp. 679–687. doi:10.1016/j.ejor.2011.07.016, December 16, 2011

Comments

Larry George says
September 11, 2022 at 9:44 PM
I broke the system for submitting articles: sorry my Greek is not very good. lambda(t) is the Poisson input rate to an M(t)/G/infinity self service system. Article containing the following…
Output(t)|death ~Poisson[l(t; Dead)*G(t; Dead)]
Output(t)|recovered ~Poisson[l(t; Recover)*G(t; Recover)]
l(t; Dead)+l(t; Recovered) = l(t) cohort infected case counts
and l(t; Dead)/l(t) = CFR
Should have been…
Output(t)|death ~Poisson[lambda(t; Dead)*G(t; Dead)]
Output(t)|recovered ~Poisson[lambda(t; Recover)*G(t; Recover)]
lambda(t; Dead)+lambda(t; Recovered) = lambda(t) cohort infected case counts
and lambda(t; Dead)/lambda(t) = CFR
And
PPoisson[l(t)*CFR*G(t; Dead);D(t)]*PPoisson[l(t)*(1-CFR)*G(t; Recover); R(t)],
should have been
PRODUCT[Poisson[lambda(t)*CFR*G(t; Dead); D(t)]*PRODUCT[Poisson[lambda(t)*(1-CFR)*G(t; Recover); R(t)]
Let me know if you want a copy of the original *.docx or *.pdf article or the workbook that does the competing failure mode estimation.

Simultaneous Estimation of Survival Functions Conditional on Failure Modes

Transient Markov Model of Competing-Risks

Suggestions?

References

About Larry George

Comments

Leave a Reply Cancel reply