A computer company tiger team held a meeting to decide how to fix their laser printer ghosting problem. Bearings seized in the squirrel-cage cooling fan for the fuser bar. The fan bearing was above fuser bar, which baked the bearing. A fix decision was made, voted on, and accepted. Party time. I asked, “How do you verify the fix?” Boo!
This an example of using current status life data. I checked status every laser printer laser-printer fan in company headquarters: operating or failed? Date of manufacture was encoded in the printer serial number, so I estimated the fan’s age-specific failure rate function, before the fix. Premature wearout was evident. Could I observe repaired or new printers at a later time and test the hypothesis that the problem had been fixed? Yes.
Current status data arise when study subjects are observed at various ages and the survival time of interest is known only to be either less or greater than the observation time. [Jewell and Kalbfleisch, Zhang et al, Jewell and Van der Laan].
Survival analysis of Afghan refugees in Pakistani refugee camps
The International Rescue Committee (www.IRC.org) surveyed two Afghan refugee camps in Pakistan before and after humanitarian aid. The objective of the project was to compare “U5” mortality (P[Life £ 5 years]) at “baseline” (2007) and “endline” (2010).
Survey data came from 2007 before and from 2010 after humanitarian aid. I used nonparametric maximum likelihood (Kaplan-Meier), on admissible ages, and least squares to estimate age-specific survival functions (aka reliability function), because some ages at deaths were inadmissible.
Conclusion: Maximum likelihood and least squares survival function estimates agreed tolerably. The refugees’ U5 estimates were 10% at baseline and 4% at endline. Pakistan population U5 was 8.7% in 2009 (Wikipedia). Refugees’ infant mortality at age one was almost 4% at baseline and endline. Pakistan population infant mortality was 6.7% in 2010 (World Bank).
Details
The endline U5 estimate’s standard deviation was less than 0.5%. The baseline estimates are for provinces 1, 2, and both. The endline estimates are for province 2 only, because province 1 was inaccessible. “Current status data” were used for both baseline and endline estimates. Ages at deaths also were used for endline estimates. Alternative endline U5 estimates agree.
The current status U5 estimates are proportions, deaths/births. Statisticians will object that the such data is “censored”; i.e., not all live births were five years old at the surveys. Methods indicate that these ratios are U5 estimates despite censoring.
The current status baseline standard error of U5 estimates are 0.71%, 1.09%, and 0.93%, and the endline standard error estimate is 0.42%. The standard error estimates are SQRT(U5(1-U5)/births). The standard deviation of the endline U5 estimate from ages at deaths is 0.46%. Only 74 ages at deaths were admissible.
Table 1. Births and deaths, U5 estimates, and their standard error estimates.
Year, Province | Births | Deaths | U5 | Std. Error Est. |
2007, 1 | 847 | ~95 | 11.2% | 1.09% |
2007, 2 | 1076 | ~112 | 10.4% | 0.93% |
2007, both | 1923 | ~208 | 10.8% | 0.71% |
2010, 2 Cur. St. | 2133 | 85 | 4.0% | 0.42% |
2010,2 Ages | 2133 | 74 | 4.1% | 0.46% |
The apparent reduction in U5 appears to be primarily a reduction in deaths after infant mortality in the first year. Table 2 and figure 1 show province 2 estimates of the cumulative distribution of age at death. Province 2 infant mortality by the first month is about the same for both baseline and endline. The difference only becomes apparent after the first month; i.e., endline deaths after infant mortality are fewer in the endline data. Pakistan countrywide infant mortality (1 year) is 6.7%.
Table 2. Cumulative probability distribution estimates of age at death.(current status)
Age at death, months | Baseline (MLE, LSE) | Endline (MLE, LSE) |
1 | 4.1%, 2.4% | 3.8%, 2.7% |
6 | 7.8%, 7.9% | 3.8%, 3.2% |
12 | 8.4%, 9.6% | 3.8%, 3.2% |
24 | 10.4%, 10.4% | 3.9%, 3.4% |
36 | 10.4%, 10.4% | 3.9%, 3.9% |
There are two sets of cumulative probability distribution estimates; “MLE” and “LSE” denote maximum likelihood and least squares estimates They don’t always agree, because methods differ. The more appropriate estimates depend on your objectives, costs, and risks.
The MLE and LSE cumulative probability distribution estimates agree for ages greater than 3 years, because they were constrained to be less than or equal to the sample proportion U5. That MLE and LSE cumulative probability distribution estimates achieve their upper bounds justifies the U5 ratio estimates in table 1, even though not all births were five years old at the times of the surveys. You can see from figure 1 that most deaths occurred before age two.
The endline survey data supported two cumulative probability distribution estimates: one from current status data and one from age-at-death data. The latter is the Kaplan-Meier estimator. There were fewer valid ages at deaths, because some ages at deaths were after the survey. Figure 2 shows these alternative estimators. Neither is right or wrong. The sample uncertainty in the Kaplan-Meier estimator is quantified by Greenwood’s standard deviation estimate. Figure 2 shows that, according to the age-at-death data, infant mortality extends to approximately one year of age.
Applications
Simultaneous nonparametric estimation of COVID-19 survival functions conditional on death or on recovery, from case, death, and recovery counts. This is equivalent to reliability estimation by failure mode, without life data.
Suppose you had only the inventory counts and whether units were good or failed at two or more times?
Would you like to verify the effects of changes in design or process?
References
Zhigang Zhang, Jianguo Sun, Liuquan Sun, “Statistical analysis of current status data with informative observation times”, Stat Med, 2005 May 15;24(9):1399-407. doi: 10.1002/sim.2001, Statistical analysis of current status data with informative observation times – PubMed (nih.gov)
Jianguo Sun & John D. Kalbfleisch (1993) The Analysis of Current Status Data on Point Processes, Journal of the American Statistical Association, 88:424, 1449-1454, DOI: 10.1080/01621459.1993.10476432
“Survival Estimation from Current Status Data,” (Afghan U5 and infant mortality in Pakistan refugee camps before and after humanitarian aid) Joint Statist. Meeting, ASA, San Diego, July 2012 (co-author Hsin-Yi (Cindy) Weng)
Nicholas P. Jewell and Mark J. van der Laan. “Current Status Data: Review, Recent Developments and Open Problems”, U.C. Berkeley Division of Biostatistics Working Paper Series, 2002, paper 113
Leave a Reply