Introduction
I am currently reading the book Survival Analysis: Techniques for Censored and Truncated Data, Second Edition (John P. Klein and Melvin L. Moescheberger). Although the techniques presented in this book focus on applications in biology and medicine, the same statistical tools can also be applied to disciplines ranging from engineering to economics and demography. I have a background in mechanical engineering and am interested in applying survival modeling concepts to data from reliability engineering, manufacturing and quality assurance. This article is the first of, hopefully, many articles that I intend to write as I finish reading different chapters from the book.
The data set(s) that will be analysed are the ones that have been used as examples in another book: Statistical Methods for Reliability Data, Second Edition (William Q. Meeker, Luis A. Escobar, Francis G. Pascual). Both the books I mentioned are excellent resources for anyone who is interested in learning more about this topic.
In this article, we will analyze vehicle shock absorber failure time data Failure time data is also known as survival data, life data, event-time data or reliability data, depending on the field of study. and estimate a few basic survival quantities. The data contains failure times (in kilometers driven) and the mode of failure, first reported by O’Connor (1985) O’Connor, P. D. T. (1985). Practical Reliability Engineering. Wiley. [54, 610]. We will ignore the mode of failure for now and will only consider whether a failure occurred or not, i.e., censored. In a future article, I plan to use the different failure modes to discuss competing risks for time-to-failure data.
Kaplan Meier Curve
Before we dive into the data and the packages used for the analysis in R, here is a brief introduction to a few basic survival concepts and their mathematical expressions. You don’t strictly need to know these formulas to conduct an analysis using packages in R. This would be of interest to those who prefer to peek a little under the hood to see some of the package’s internal workings.
Survival function $-S(t)-$Cumulative distribution function (CDF) $-F(t) = 1 – S(t)-$ is defined as the probability that a unit’s time to failure ($-T-$) is greater than some time ($-t-$)Time is measured in kilometers in this example.
$$ S\left(t\right)=Pr\left(T>t\right) $$
A non-parametric estimate of the survival function is given by the product-limit estimator, proposed by Kaplan and Meir (1958). Kaplan, E. L. and Meier, P. Nonparametric Estimation from Incomplete Observations. *Journal of the American Statistical Association* 53 (1958): 457–481. It is given by:
$$\hat{S}(t) =1 \quad \text{ if }t< t_{1}\\$$
and
$$\hat{S}(t) =\prod_{t_{i} \le t}[1 – \frac{d_{i}}{Y_{i}}] \quad \text{ if }t_{1}\le t\\$$
where $-\hat{S}(t)-$ is an estimate of the survival function $-S(t)-$, $-d_{i}-$ is the number of events at time $-t_{i}-$ and $-Y_{i}-$ is the number at risk of experiencing the event at time $-t_{i}-$.
The variance of the estimate of the survival function is given by Greenwood’s formula:
$$\hat{V}[\hat{S}(t)] = \hat{S}(t)^2 \sum_{t_{i} \le t} \frac{d_{i}}{Y_{i}(Y_{i} – d_{i})}$$
The hazard function It is often called failure rate in Reliability is another basic quantity of interest in reliability studies and will be introduced in a later article.
Motivating example: Shock Absorber Data
In this example, we are interested in finding a non-parametric estimate of the survival function in order to gauge the probability that a shock absorber will survive a certain distance before failing.
ShockAbsorber <- read_csv("Data/ShockAbsorber.csv")
knitr::kable((ShockAbsorber), caption = "Table 1. Shock Absorber Data", align = rep('c', 3))
Table 1. Shock Absorber Data
Kilometers | Failure Mode | Censoring Indicator |
---|---|---|
6700 | Mode1 | Failed |
6950 | Censored | Censored |
7820 | Censored | Censored |
8790 | Censored | Censored |
9120 | Mode2 | Failed |
9660 | Censored | Censored |
9820 | Censored | Censored |
11310 | Censored | Censored |
11690 | Censored | Censored |
11850 | Censored | Censored |
11880 | Censored | Censored |
12140 | Censored | Censored |
12200 | Mode1 | Failed |
12870 | Censored | Censored |
13150 | Mode2 | Failed |
13330 | Censored | Censored |
13470 | Censored | Censored |
14040 | Censored | Censored |
14300 | Mode1 | Failed |
17520 | Mode1 | Failed |
17540 | Censored | Censored |
17890 | Censored | Censored |
18450 | Censored | Censored |
18960 | Censored | Censored |
18980 | Censored | Censored |
19410 | Censored | Censored |
20100 | Mode2 | Failed |
20100 | Censored | Censored |
20150 | Censored | Censored |
20320 | Censored | Censored |
20900 | Mode2 | Failed |
22700 | Mode1 | Failed |
23490 | Censored | Censored |
26510 | Mode1 | Failed |
27410 | Censored | Censored |
27490 | Mode1 | Failed |
27890 | Censored | Censored |
28100 | Censored | Censored |
The two different modes of failures in this example will be ignored for now and we will only consider whether a shock absorber failed or not. This is indicated by the “censoring indicator” column in the above table. All the censored observations in the above table are considered to be “right” censored Other types of censoring are “left” and “interval” censoring.. This means that failure has not yet occurred and would be expected to occur some time in the future (in terms of kilometers) had we continued our observation. The concept of censoring differentiates survival analysis from other types of analysis in statistics. Information contained in the statement “the shock absorber survived at least 6950 km” is different to the information in the statement “the shock absorber failed at 6950 km”. Taking censoring into account allows us to incorporate the correct information into our analysis.
The survival package is the cornerstone of the survival analysis ecosystem in R and we will use the same for our analysis. The author of the package, Dr. Terry Therneau, started work on this package in 1985 and the package has evolved ever since. There are other packages that build on the survival package for functions that are not available within this package. For example, we will use the survMisc package later in this article when we build confidence bands for the survival function.
Failure is coded as “1” and censored is coded as “0” as per convention. The survival package also follows this convention and will automatically detect a “1” as an event (failure).
ShockAbsorber <- ShockAbsorber %>%
dplyr::mutate(Event = case_when(
`Censoring Indicator` == "Failed" ~ 1,
`Censoring Indicator` == "Censored" ~ 0
))
Next, we build a simple Kaplan Meir survival curve using the survfit Ignore the arguments stype, ctype and conf.type for now. We will discuss them some other time function from the survival package.
fit_1 <- survival::survfit(
survival::Surv(
time = ShockAbsorber$Kilometers,
event = ShockAbsorber$Event) ~ 1 #Right hand formula for a single curve
, stype = 1 #Survival type 1 means direct estimate of survival curve. Type 2 would have been exp(-H(t))
, ctype = 1 #Cumulative Hazard type 1 is Nelson-Aalen. Type 2 is Fleming-Harrington correction for tied events.
, conf.type = "arcsin"
, conf.int = 0.9
, data = ShockAbsorber)
survminer::ggsurvplot(fit_1,
risk.table = TRUE,
title = "Kaplan-Meir curve",
xlab = "Kilometers")
The Kaplan Meir curve above is a step function, with steps at every failure time (distance in this case). There is no step when we have censored observations, which are indicated by the short “|” marks on the curve.
The summary function gives us the survival probability, standard errors Square root of the variance formula above and the confidence intervals at all the failure times.
summary(fit_1)
## Call: survfit(formula = survival::Surv(time = ShockAbsorber$Kilometers,
## event = ShockAbsorber$Event) ~ 1, data = ShockAbsorber, stype = 1,
## ctype = 1, conf.type = "arcsin", conf.int = 0.9)
##
## time n.risk n.event survival std.err lower 90% CI upper 90% CI
## 6700 38 1 0.974 0.0260 0.9147 0.999
## 9120 34 1 0.945 0.0378 0.8671 0.990
## 12200 26 1 0.909 0.0509 0.8089 0.974
## 13150 24 1 0.871 0.0613 0.7549 0.954
## 14300 20 1 0.827 0.0720 0.6948 0.928
## 17520 19 1 0.784 0.0803 0.6394 0.899
## 20100 12 1 0.718 0.0966 0.5493 0.861
## 20900 8 1 0.629 0.1192 0.4275 0.809
## 22700 7 1 0.539 0.1317 0.3253 0.745
## 26510 5 1 0.431 0.1428 0.2125 0.665
## 27490 3 1 0.287 0.1511 0.0824 0.555
The argument times in the summary function can be used to obtain survival probability estimates at specific times, as shown below.
summary(fit_1, times = c(7500, 19000, 26000))
## Call: survfit(formula = survival::Surv(time = ShockAbsorber$Kilometers,
## event = ShockAbsorber$Event) ~ 1, data = ShockAbsorber, stype = 1,
## ctype = 1, conf.type = "arcsin", conf.int = 0.9)
##
## time n.risk n.event survival std.err lower 90% CI upper 90% CI
## 7500 36 1 0.974 0.0260 0.915 0.999
## 19000 13 5 0.784 0.0803 0.639 0.899
## 26000 5 3 0.539 0.1317 0.325 0.745
Notice that the n.event column in the above summary is the number of failures between two time points. For eg: there are 3 failures in the data between 19,000 km and 26,000 km.
The probability that a shock absorber will last at least 19,000 km is 78.4%. It is always a good idea to present an estimate with a confidence interval, which is discussed next.
Pointwise Confidence Intervals
The simplest confidence interval for an estimate of the survival probability at a given time $-(t_{0})-$ is the linear confidence interval, given by
$$\hat{S}(t_{0})\pm Z_{1-\frac{\alpha}{2}}\sigma_{s}(t_{0})\hat{S}(t_{0})$$
where $-\sigma_{s}^2(t) = \hat{V}[\hat{S}(t)]/\hat{S}^2(t)-$ and $-Z_{1-\frac{\alpha}{2}}-$ is the $-(1-\frac{\alpha}{2})-$ percentile of the standard normal distribution.
Although the linear confidence interval has a simple form, the normal distribution assumption used in constructing this interval is seldom valid when the sample size is small. In addition, the performance of this interval near the ends i.e $-\hat{S}(t)-$ close to 0 or 1 can sometimes give us CI values outside the [0,1] range, which does not make sense for a probability value. Hence, transformations are used to get better confidence intervals.
Different transformations like the log, log-log, logit, arcsin are available in the survival package by using the conf.type argument The confidence intervals obtained through different transformations may not be symmetric like the linear CI. The formulas for the confidence intervals from the different transformations can be found in any survival modeling textbook I did consider including the formulas here for educational purposes, but the Latex code for typing the formulas would have taken me a very long time.. As per the text in the survival package documentation by Dr. Therneau: “Which of the choices of log, log-log, or logit is ”best” depends on the details of any particular simulation study”. I have used the “arcsin” transformation for constructing our confidence intervals since its coverage probability tends to be a bit more conservative i.e. greater than $-(1-\alpha)-$ for small sample sizes, as mentioned in the book.
The probability that a shock absorber will last at least 19,000 km is 78.4% with a 90% confidence interval (63.9%, 89.9%).
The Kaplan-Meir curve plotted above shows the confidence region around the curve. It can be seen that the interval width increases with time (distance) since the number of observations (number at risk) decrease with time, thereby increasing the standard error.
Confidence bands
The confidence intervals obtained above are only valid at a particular point $-(t_{0})-$. If we want a confidence interval for the survival function over a range of values of t, we will need to construct confidence bands. These confidence bands ensure that the statistical uncertainty is simultaneously quantified over a range of time and not just at a single point. In other words, the confidence band guarantees, with a given confidence, that the survival function falls within the band for all t in some interval. I see this as being somewhat similar to the correction of the significance value $-\alpha-$ for multiple comparison tests.
It is important to point out that although the Kaplan-Meir plot above shows the confidence region that looks like a band, it is based on point wise confidence intervals and is not a confidence band.
We will now construct a 90% confidence band for the range 10,000 km to 20,000 km.
The package survMisc, contains functions to build a confidence band. This package requires a “ten” object, ten stands for time, event(s) and number at risk. which is easily obtained as follows
ten_ShockAbsorber <- survMisc::ten(
survival::Surv(time = ShockAbsorber$Kilometers,
event = ShockAbsorber$Event)
)
There are a couple of different methods to obtain confidence bands. We will build a band based on the approach suggested by Nair (1984). Nair, V. N. Confidence Bands for Survival Functions with Censored Data: A Comparative Study. Technometrics 14 (1984): 265–275. This type of band is called Equal Probability or EP bands and are proportional to the pointwise confidence intervals.
survMisc::ci(ten_ShockAbsorber,
CI = "0.9", #As the function currently relies on lookup tables, currently only 90%, 95% (the default) and 99% are supported.
how = "nair", #Method to create confidence band
trans = "asi", #Transformation to use. Note that "log" in the survMisc package corresponds to the log transformation in Klein-Moeschberger
tL = 10000,
tU = 20000)
## t S Sv SCV cg lower upper
## 1: 11000 0.95 0.0014 0.0016 1 0.82 1
## 2: 12000 0.95 0.0014 0.0016 1 0.82 1
## 3: 12000 0.95 0.0014 0.0016 1 0.82 1
## 4: 12000 0.95 0.0014 0.0016 1 0.82 1
## 5: 12000 0.95 0.0014 0.0016 1 0.82 1
## 6: 12000 0.91 0.0026 0.0031 1 0.75 0.99
## 7: 13000 0.91 0.0026 0.0031 1 0.75 0.99
## 8: 13000 0.87 0.0038 0.005 1 0.69 0.98
## 9: 13000 0.87 0.0038 0.005 1 0.69 0.98
## 10: 13000 0.87 0.0038 0.005 1 0.69 0.98
## 11: 14000 0.87 0.0038 0.005 1 0.69 0.98
## 12: 14000 0.83 0.0052 0.0076 1 0.63 0.96
## 13: 18000 0.78 0.0065 0.011 1 0.57 0.94
## 14: 18000 0.78 0.0065 0.011 1 0.57 0.94
## 15: 18000 0.78 0.0065 0.011 1 0.57 0.94
## 16: 18000 0.78 0.0065 0.011 1 0.57 0.94
## 17: 19000 0.78 0.0065 0.011 1 0.57 0.94
## 18: 19000 0.78 0.0065 0.011 1 0.57 0.94
## 19: 19000 0.78 0.0065 0.011 1 0.57 0.94
## t S Sv SCV cg lower upper
The confidence band is expected to be wider than the pointwise confidence intervals. We have used the same arc sine transformation here as well. The lower and upper values of the 90% Note that the calculation of the confidence bands depend on lookup tables inside the survMisc package. These tables have only been constructed for the 90%, 95% and 99% confidence levels, due to which these 3 values are the only permissible values in the CI argument to the survMisc function. confidence band at 19,000 km is (51%, 91%) whereas the point wise confidence interval at 19,000 km, which we found previously, is (61.5%, 88.5%).
Mean Survival Time
The mean time to failure of shock absorbers can also be easily calculated using the survival function. Before we directly apply the appropriate function and find the mean, there is one small detail that needs to be clarified.
The mean time to failure is a function of the survival function. To be more precise, the mean time to failure is the total area under the survival curve. If the last observation is an “event” (i.e. “failure”), the Kaplan-Meir curve will have a step and the survival probability will drop to 0 at this point. But if the last observation is a censored observation, like we see in our shock absorber example, there is no step and the curve is open at this end. We do not know when the last observation experienced failure. Clearly, this will have an impact on the area under the survival curve. Hence, we are only able to calculate the “restricted” mean, where our calculation of the mean is restricted to a certain time $-\tau-$. Efron’s tail correction suggests treating the last observation as an “event” in case it is censored.
The mean time to failure of the shock absorbers, restricting our calculation of the mean to the highest observation (28,100 km) is obtained as follows:
print(fit_1, rmean = 28100)
## Call: survfit(formula = survival::Surv(time = ShockAbsorber$Kilometers,
## event = ShockAbsorber$Event) ~ 1, data = ShockAbsorber, stype = 1,
## ctype = 1, conf.type = "arcsin", conf.int = 0.9)
##
## n events rmean* se(rmean) median 0.9LCL 0.9UCL
## [1,] 38 11 22875 1283 26510 20900 NA
## * restricted mean with upper limit = 28100
The mean time to failure (restricted) is 22,875 km. Suppose we have some knowledge (information) from outside the sample data that suggests that shock absorbers are known to have lasted until 30,000 km, we can change $-\tau-$ to 30,000 km and then calculate the restricted mean, as follows.
print(fit_1, rmean = 30000)
## Call: survfit(formula = survival::Surv(time = ShockAbsorber$Kilometers,
## event = ShockAbsorber$Event) ~ 1, data = ShockAbsorber, stype = 1,
## ctype = 1, conf.type = "arcsin", conf.int = 0.9)
##
## n events rmean* se(rmean) median 0.9LCL 0.9UCL
## [1,] 38 11 23421 1443 26510 20900 NA
## * restricted mean with upper limit = 30000
We will stick to the information in the data and calculate the restricted mean with $-\tau-$ as the max distance from the data i.e. 28,100 km.
The lower and upper interval limits shown in the output are for the median. The confidence limits for the mean can be calculated as follows:
$$\hat{\mu}_{\tau} \pm Z_{1-\frac{\alpha}{2}} \sqrt{\hat{V}[\hat{\mu}_{\tau}]}$$
where $-\sqrt{\hat{V}[\hat{\mu}_{\tau}]}-$ is the standard error of the restricted mean given in the output of the print function and $-Z_{1-\frac{\alpha}{2}}-$ is the $-(1-\frac{\alpha}{2})-$ percentile of the standard normal distribution. The confidence interval of the restricted mean with $-\tau-$ taken as 28,100 km is thus given by:
22875 - ((1283)*qnorm(0.95))
## [1] 20764.65
22875 + ((1283)*qnorm(0.95))
## [1] 24985.35
The restricted mean time to failure ($-\tau-$ = 28,100 km) is 22,875 km with a 90% confidence interval (20764.65, 24985.35)
End
I hope you enjoyed reading this article! Any comments or suggestions or if you find any errors and are keen to help correct the error, please write to me at shishir909@gmail.com, or leave a comment below.
Leave a Reply