Introduction

I am currently reading the book Survival Analysis: Techniques for Censored and Truncated Data, Second Edition (John P. Klein and Melvin L. Moescheberger). Although the techniques presented in this book focus on applications in biology and medicine, the same statistical tools can also be applied to disciplines ranging from engineering to economics and demography. I have a background in mechanical engineering and am interested in applying survival modeling concepts to data from reliability engineering, manufacturing and quality assurance. This article is the first of, hopefully, many articles that I intend to write as I finish reading different chapters from the book.

The data set(s) that will be analysed are the ones that have been used as examples in another book: Statistical Methods for Reliability Data, Second Edition (William Q. Meeker, Luis A. Escobar, Francis G. Pascual). Both the books I mentioned are excellent resources for anyone who is interested in learning more about this topic.

In this article, we will analyze vehicle shock absorber failure time data Failure time data is also known as survival data, life data, event-time data or reliability data, depending on the field of study. and estimate a few basic survival quantities. The data contains failure times (in kilometers driven) and the mode of failure, first reported by O’Connor (1985) O’Connor, P. D. T. (1985). Practical Reliability Engineering. Wiley. [54, 610]. We will ignore the mode of failure for now and will only consider whether a failure occurred or not, i.e., censored. In a future article, I plan to use the different failure modes to discuss competing risks for time-to-failure data.

Kaplan Meier Curve

Before we dive into the data and the packages used for the analysis in R, here is a brief introduction to a few basic survival concepts and their mathematical expressions. You don’t strictly need to know these formulas to conduct an analysis using packages in R. This would be of interest to those who prefer to peek a little under the hood to see some of the package’s internal workings.

In this article, we will analyze vehicle shock absorber failure time data Failure time data is also known as survival data, life data, event-time data or reliability data, depending on the field of study. and estimate a few basic survival quantities. The data contains failure times (in kilometers driven) and the mode of failure, first reported by O’Connor (1985) O’Connor, P. D. T. (1985). Practical Reliability Engineering. Wiley. [54, 610]. We will ignore the mode of failure for now and will only consider whether a failure occurred or not, i.e., censored. In a future article, I plan to use the different failure modes to discuss competing risks for time-to-failure data.

$$ displaystyle Sleft(tright)=Prleft(T>tright) $$

 

Before we dive into the data and the packages used for the analysis in R, here is a brief introduction to a few basic survival concepts and their mathematical expressions. You don’t strictly need to know these formulas to conduct an analysis using packages in R. This would be of interest to those who prefer to peek a little under the hood to see some of the package’s internal workings.

In this article, we will analyze vehicle shock absorber failure time data Failure time data is also known as survival data, life data, event-time data or reliability data, depending on the field of study. and estimate a few basic survival quantities. The data contains failure times (in kilometers driven) and the mode of failure, first reported by O’Connor (1985) O’Connor, P. D. T. (1985). Practical Reliability Engineering. Wiley. [54, 610]. We will ignore the mode of failure for now and will only consider whether a failure occurred or not, i.e., censored. In a future article, I plan to use the different failure modes to discuss competing risks for time-to-failure data.

$$ displaystyle Sleft(tright)=Prleft(T>tright) $$

[S(t) = Pr(T > t)]

A non-parametric estimate of the survival function is given by the product-limit estimator, proposed by Kaplan and Meir (1958) Kaplan, E. L. and Meier, P. Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association 53 (1958): 457–481.. It is given by:

[hat{S}(t) =1 quad text{ if }t< t_{1}\]

and

[hat{S}(t) =prod_{t_{i} le t}[1 – frac{d_{i}}{Y_{i}}] quad text{ if }t_{1}le t\]

where (hat{S}(t)) is an estimate of the survival function (S(t)), (d_{i}) is the number of events at time (t_{i}) and (Y_{i}) is the number at risk of experiencing the event at time (t_{i}).

The variance of the estimate of the survival function is given by Greenwood’s formula:

[hat{V}[hat{S}(t)] = hat{S}(t)^2 sum_{t_{i} le t} frac{d_{i}}{Y_{i}(Y_{i} – d_{i})} ]

The hazard function

It is often called failure rate in Reliability. is another basic quantity of interest in reliability studies, and will be introduced in a later article.

Motivating example: Shock Absorber Data

In this example, we are interested in finding a non-parametric estimate of the survival function in order to gauge the probability that a shock absorber will survive a certain distance before failing.

ShockAbsorber <- read_csv("Data/ShockAbsorber.csv")
knitr::kable((ShockAbsorber), caption = "Table 1. Shock Absorber Data", align = rep('c', 3))

Table 1. Shock Absorber Data

Kilometers Failure Mode Censoring Indicator
6700 Mode1 Failed
6950 Censored Censored
7820 Censored Censored
8790 Censored Censored
9120 Mode2 Failed
9660 Censored Censored
9820 Censored Censored
11310 Censored Censored
11690 Censored Censored
11850 Censored Censored
11880 Censored Censored
12140 Censored Censored
12200 Mode1 Failed
12870 Censored Censored
13150 Mode2 Failed
13330 Censored Censored
13470 Censored Censored
14040 Censored Censored
14300 Mode1 Failed
17520 Mode1 Failed
17540 Censored Censored
17890 Censored Censored
18450 Censored Censored
18960 Censored Censored
18980 Censored Censored
19410 Censored Censored
20100 Mode2 Failed
20100 Censored Censored
20150 Censored Censored
20320 Censored Censored
20900 Mode2 Failed
22700 Mode1 Failed
23490 Censored Censored
26510 Mode1 Failed
27410 Censored Censored
27490 Mode1 Failed
27890 Censored Censored
28100 Censored Censored

The two different modes of failures in this example will be ignored for now and we will only consider whether a shock absorber failed or not. This is indicated by the “censoring indicator” column in the above table. All the censored observations in the above table are considered to be “right” censored77 Other types of censoring are “left” and “interval” censoring.. This means that failure has not yet occurred and would be expected to occur some time in the future (in terms of kilometers) had we continued our observation. The concept of censoring differentiates survival analysis from other types of analysis in statistics. Information contained in the statement “the shock absorber survived at least 6950 km” is different to the information in the statement “the shock absorber failed at 6950 km”. Taking censoring into account allows us to incorporate the correct information into our analysis.

The survival package is the cornerstone of the survival analysis ecosystem in R and we will use the same for our analysis. The author of the package, Dr. Terry Therneau, started work on this package in 1985 and the package has evolved ever since. There are other packages that build on the survival package for functions that are not available within this package. For example, we will use the survMisc package later in this article when we build confidence bands for the survival function. Failure is coded as “1” and censored is coded as “0” as per convention. The survival package also follows this convention and will automatically detect a “1” as an event (failure).

ShockAbsorber <- ShockAbsorber %>%
  dplyr::mutate(Event = case_when(
    `Censoring Indicator` == "Failed" ~ 1,
    `Censoring Indicator` == "Censored" ~ 0
  ))

Next, we build a simple Kaplan Meir survival curve using the survfit88 Ignore the arguments stype, ctype and conf.type for now. We will discuss them some other timefunction from the survival package.

fit_1 <- survival::survfit(
  survival::Surv(
    time = ShockAbsorber$Kilometers, 
    event = ShockAbsorber$Event) ~ 1 #Right hand formula for a single curve
  , stype = 1 #Survival type 1 means direct estimate of survival curve. Type 2 would have been exp(-H(t))
  , ctype = 1 #Cumulative Hazard type 1 is Nelson-Aalen. Type 2 is Fleming-Harrington correction for tied events.
  , conf.type = "arcsin"
  , conf.int = 0.9
  , data = ShockAbsorber) 

survminer::ggsurvplot(fit_1,
                      risk.table = TRUE,
                      title = "Kaplan-Meir curve",
                      xlab = "Kilometers")

The Kaplan Meir curve above is a step function, with steps at every failure time (distance in this case). There is no step when we have censored observations, which are indicated by the short “|” marks on the curve.

The summary function gives us the survival probability, standard errors99 Square root of the variance formula aboveand the confidence intervals at all the failure times.

summary(fit_1)
## Call: survfit(formula = survival::Surv(time = ShockAbsorber$Kilometers, 
##     event = ShockAbsorber$Event) ~ 1, data = ShockAbsorber, stype = 1, 
##     ctype = 1, conf.type = "arcsin", conf.int = 0.9)
## 
##   time n.risk n.event survival std.err lower 90% CI upper 90% CI
##   6700     38       1    0.974  0.0260       0.9147        0.999
##   9120     34       1    0.945  0.0378       0.8671        0.990
##  12200     26       1    0.909  0.0509       0.8089        0.974
##  13150     24       1    0.871  0.0613       0.7549        0.954
##  14300     20       1    0.827  0.0720       0.6948        0.928
##  17520     19       1    0.784  0.0803       0.6394        0.899
##  20100     12       1    0.718  0.0966       0.5493        0.861
##  20900      8       1    0.629  0.1192       0.4275        0.809
##  22700      7       1    0.539  0.1317       0.3253        0.745
##  26510      5       1    0.431  0.1428       0.2125        0.665
##  27490      3       1    0.287  0.1511       0.0824        0.555

The argument times in the summary function can be used to obtain survival probability estimates at specific times, as shown below.

summary(fit_1, times = c(7500, 19000, 26000))
## Call: survfit(formula = survival::Surv(time = ShockAbsorber$Kilometers, 
##     event = ShockAbsorber$Event) ~ 1, data = ShockAbsorber, stype = 1, 
##     ctype = 1, conf.type = "arcsin", conf.int = 0.9)
## 
##   time n.risk n.event survival std.err lower 90% CI upper 90% CI
##   7500     36       1    0.974  0.0260        0.915        0.999
##  19000     13       5    0.784  0.0803        0.639        0.899
##  26000      5       3    0.539  0.1317        0.325        0.745

Notice that the n.event column in the above summary is the number of failures between two time points. For eg: there are 3 failures in the data between 19,000 km and 26,000 km.

The probability that a shock absorber will last at least 19,000 km is 78.4%. It is always a good idea to present an estimate with a confidence interval, which is discussed next.

Pointwise Confidence Intervals

The simplest confidence interval for an estimate of the survival probability at a given time (𝑡0) is the linear confidence interval, given by

𝑆̂(𝑡0)±𝑍1−𝛼2𝜎𝑠(𝑡0)𝑆̂(𝑡0)

where 𝜎2𝑠(𝑡)=𝑉̂[𝑆̂(𝑡)]/𝑆̂2(𝑡) and 𝑍1−𝛼2 is the (1−𝛼2) percentile of the standard normal distribution.

Although the linear confidence interval has a simple form, the normal distribution assumption used in constructing this interval is seldom valid when the sample size is small. In addition, the performance of this interval near the ends i.e 𝑆̂(𝑡) close to 0 or 1 can sometimes give us CI values outside the [0,1] range, which does not make sense for a probability value. Hence, transformations are used to get better confidence intervals.

Different transformations like the loglog-loglogitarcsin are available in the survival package by using the conf.type argument1010 The confidence intervals obtained through different transformations may not be symmetric like the linear CI. The formulas for the confidence intervals from the different transformations can be found in any survival modeling textbook1111 I did consider including the formulas here for educational purposes, but the Latex code for typing the formulas would have taken me a very long time.. As per the text in the survivalpackage documentation by Dr. Therneau: “Which of the choices of log, log-log, or logit is”best” depends on the details of any particular simulation study”. I have used the “arcsin” transformation for constructing our confidence intervals since its coverage probability tends to be a bit more conservative i.e. greater than (1−𝛼) for small sample sizes, as mentioned in the book.

The probability that a shock absorber will last at least 19,000 km is 78.4% with a 90% confidence interval (63.9%, 89.9%).

The Kaplan-Meir curve plotted above shows the confidence region around the curve. It can be seen that the interval width increases with time (distance) since the number of observations (number at risk) decrease with time, thereby increasing the standard error.

Confidence bands

The confidence intervals obtained above are only valid at a particular point (𝑡0). If we want a confidence interval for the survival function over a range of values of t, we will need to construct confidence bands. These confidence bands ensure that the statistical uncertainty is simultaneously quantified over a range of time and not just at a single point.1212 In other words, the confidence band guarantees, with a given confidence, that the survival function falls within the band for all tin some interval. I see this as being somewhat similar to the correction of the significance value 𝛼 for multiple comparison tests.

It is important to point out that although the Kaplan-Meir plot above shows the confidence region that looks like a band, it is based on point wise confidence intervals and is not a confidence band.

We will now construct a 90% confidence band for the range 10,000 km to 20,000 km.

The package survMisc, contains functions to build a confidence band. This package requires a “ten” object1313 ten stands for time, event(s) and number at risk., which is easily obtained as follows

ten_ShockAbsorber <- survMisc::ten(
  survival::Surv(time = ShockAbsorber$Kilometers,
                 event = ShockAbsorber$Event)
  )

There are a couple of different methods to obtain confidence bands. We will build a band based on the approach suggested by Nair (1984)1414 Nair, V. N. Confidence Bands for Survival Functions with Censored Data: A Comparative Study. Technometrics 14 (1984): 265–275.. This type of band is called Equal Probability or EP bands and are proportional to the pointwise confidence intervals.

survMisc::ci(ten_ShockAbsorber,
   CI = "0.9", #As the function currently relies on lookup tables, currently only 90%, 95% (the default) and 99% are supported.
   how = "nair", #Method to create confidence band
   trans = "asi", #Transformation to use. Note that "log" in the survMisc package corresponds to the log transformation in Klein-Moeschberger
   tL = 10000,
   tU = 20000)
##         t    S     Sv    SCV cg lower upper
##  1: 11000 0.95 0.0014 0.0016  1  0.82     1
##  2: 12000 0.95 0.0014 0.0016  1  0.82     1
##  3: 12000 0.95 0.0014 0.0016  1  0.82     1
##  4: 12000 0.95 0.0014 0.0016  1  0.82     1
##  5: 12000 0.95 0.0014 0.0016  1  0.82     1
##  6: 12000 0.91 0.0026 0.0031  1  0.75  0.99
##  7: 13000 0.91 0.0026 0.0031  1  0.75  0.99
##  8: 13000 0.87 0.0038  0.005  1  0.69  0.98
##  9: 13000 0.87 0.0038  0.005  1  0.69  0.98
## 10: 13000 0.87 0.0038  0.005  1  0.69  0.98
## 11: 14000 0.87 0.0038  0.005  1  0.69  0.98
## 12: 14000 0.83 0.0052 0.0076  1  0.63  0.96
## 13: 18000 0.78 0.0065  0.011  1  0.57  0.94
## 14: 18000 0.78 0.0065  0.011  1  0.57  0.94
## 15: 18000 0.78 0.0065  0.011  1  0.57  0.94
## 16: 18000 0.78 0.0065  0.011  1  0.57  0.94
## 17: 19000 0.78 0.0065  0.011  1  0.57  0.94
## 18: 19000 0.78 0.0065  0.011  1  0.57  0.94
## 19: 19000 0.78 0.0065  0.011  1  0.57  0.94
##         t    S     Sv    SCV cg lower upper

The confidence band is expected to be wider than the pointwise confidence intervals. We have used the same arc sine transformation here as well. The lower and upper values of the 90%1515 Note that the calculation of the confidence bands depend on lookup tables inside the survMisc package. These tables have only been constructed for the 90%, 95% and 99% confidence levels, due to which these 3 values are the only permissible values in the CI argument to the survMisc function. confidence band at 19,000 km is (51%, 91%) whereas the point wise confidence interval at 19,000 km, which we found previously, is (61.5%, 88.5%).

Mean Survival Time

The mean time to failure of shock absorbers can also be easily calculated using the survival function. Before we directly apply the appropriate function and find the mean, there is one small detail that needs to be clarified.

The mean time to failure is a function of the survival function. To be more precise, the mean time to failure is the total area under the survival curve. If the last observation is an “event” (i.e. “failure”), the Kaplan-Meir curve will have a step and the survival probability will drop to 0 at this point. But if the last observation is a censored observation, like we see in our shock absorber example, there is no step and the curve is open at this end. We do not know when the last observation experienced failure. Clearly, this will have an impact on the area under the survival curve. Hence, we are only able to calculate the “restricted” mean, where our calculation of the mean is restricted to a certain time 𝜏. Efron’s tail correction suggests treating the last observation as an “event” in case it is censored.

The mean time to failure of the shock absobers, restricting our calculation of the mean to the highest observation (28,100 km) is obtained as follows:

print(fit_1, rmean = 28100)
## Call: survfit(formula = survival::Surv(time = ShockAbsorber$Kilometers, 
##     event = ShockAbsorber$Event) ~ 1, data = ShockAbsorber, stype = 1, 
##     ctype = 1, conf.type = "arcsin", conf.int = 0.9)
## 
##       n events rmean* se(rmean) median 0.9LCL 0.9UCL
## [1,] 38     11  22875      1283  26510  20900     NA
##     * restricted mean with upper limit =  28100

The mean time to failure (restricted) is 22,875 km. Suppose we have some knowledge (information) from outside the sample data that suggests that shock absorbers are known to have lasted until 30,000 km, we can change 𝜏 to 30,000 km and then calculate the restricted mean, as follows.

print(fit_1, rmean = 30000)
## Call: survfit(formula = survival::Surv(time = ShockAbsorber$Kilometers, 
##     event = ShockAbsorber$Event) ~ 1, data = ShockAbsorber, stype = 1, 
##     ctype = 1, conf.type = "arcsin", conf.int = 0.9)
## 
##       n events rmean* se(rmean) median 0.9LCL 0.9UCL
## [1,] 38     11  23421      1443  26510  20900     NA
##     * restricted mean with upper limit =  30000

We will stick to the information in the data and calculate the restricted mean with 𝜏 as the max distance from the data i.e. 28,100 km.

The lower and upper interval limits shown in the output are for the median. The confidence limits for the mean can be calculated as follows:

𝜇̂𝜏±𝑍1−𝛼2𝑉̂[𝜇̂𝜏]‾‾‾‾‾‾√

where 𝑉̂[𝜇̂𝜏]‾‾‾‾‾‾√ is the standard error of the restricted mean given in the output of the print function and 𝑍1−𝛼2 is the (1−𝛼2) percentile of the standard normal distribution. The confidence interval of the restricted mean with 𝜏 taken as 28,100 km is thus given by:

22875 - ((1283)*qnorm(0.95))
## [1] 20764.65
22875 + ((1283)*qnorm(0.95))
## [1] 24985.35

The restricted mean time to failure (𝜏 = 28,100 km) is 22,875 km with a 90% confidence interval (20764.65, 24985.35)

End

I hope you enjoyed reading this article! Any comments or suggestions or if you find any errors and are keen to help correct the error, please write to me at shishir909@gmail.com.