In failure time analysis the response variable Y is a failure time. Measures of explained variation and predictive accuracy may be defined as above but when censoring occurs, as is usual in survival analysis, the estimation procedure gets complicated: The explained residual variation cannot be determined because the loss corresponding to a censored failure time is unavailable. Since the estimated explained variation has not been accepted as an estimator of the explained variation other estimation methods are needed. Graf et al. [GSSS99] and Schemper and Henderson [SH00] have proposed estimators of the explained variation in the failure time model. Graf et al. base their estimator on inverse probability weighting of the available losses whereas Schemper and Henderson use the proposed regression model to determine a loss for the unavailable failure times too. Both estimators can be considered as generalizations of the explained residual variation and coincide with the explained residual variation in the case of no censoring. How the estimators of Graf et al. and Schemper and Henderson are defined is not described in detail here. Graf et al. consider a model consisting of one parameter 0 G O = {0} and estimate the predictive accuracy We of this model in the true distribution of (Z, Y) and Schemper and Henderson focus on the Cox regression model. However, it is not complicated to generalize their estimators to the above setting. We here shortly discuss the choice of the variable of interest and the properties of the estimation procedures proposed by these authors.

As noted in several papers on explained variation and predictive accuracy in failure time models, e.g. Korn and Simon [KS90] and Henderson [Hen95], the variable of interest is not necessarily the failure time Y. This is due to the nature of failure time data. From a medical point of view other variables of interest arise but the censoring mechanism may also influence the choice of variable of interest: If the individuals are followed until time point t it is not relevant to focus on how long they will survive further. In many cases, when considering the failure time of a patient, the actual failure time is important for patients who are expected to die soon whereas the actual failure time for long-term survivors is of less interest than the fact that they will live for a long time. If long-term survivors are defined as the individuals surviving a specified time point t, this leads to the at time point t censored failure time as the variable of interest, i.e. V = min{Y,t}. A prediction of v = t corresponds to the long-term prediction 'survival greater than or equal to t' and is to be considered successful if the individual survives time point t. No loss should be incurred in this case. When the predictions used attain the same values as the variable of interest, i.e. belong to the interval [0, t], standard loss function like quadratic and absolute loss incorporate this feature. See Henderson [Hen95] for a more elaborate discussion of the choice of loss functions.

In some cases focus is on whether an individual is alive at a specified time point t. This may be the case if a patient can be considered cured if the patient survives this time point. Thus focus is on whether the patient will be cured or not and the variable of interest therefore becomes the survival status at time point t, i.e. V = I(Y < t) where I(■) denotes the indicator function. In this case the variable of interest is binary and the standard loss functions are applicable.

A possible generalization of the survival status at time point t as the variable of interest can be obtained by considering the survival status as a process, i.e. (I(y < s) : s G [0,t]). In this case the prediction is also a process and a loss can be determined by averaging the loss for each time point in [0,t]. The average may be constructed by integration on [0,t] with respect to the Lebesgue measure or another suitable measure (see e.g. Graf and

Schumacher [GS95], Graf et al. [GSSS99] or Schemper and Henderson [SH00]). The concept of explained variation and predictive accuracy can be defined in the same manner as above. However, it is not straightforward to prove consistency of the estimators when integrating the losses of the survival status process.

Note that when considering these variables of interest, the loss becomes available for some of the censored failure times: If a failure time is censored after time point t, the at time point t censored survival time is min{Y, t} = t whereas the survival status is I(Y < t) = 0.

The estimator proposed by Graf et al. is constructed using inverse probability weighting where the weights are based on the Kaplan-Meier estimator of the censoring distribution. Their estimator therefore resembles the explained residual variation in the sense that it estimates the predictive accuracy in case of model misspecification. The estimator of Schemper and Henderson is instead based on the proposed model and in case of model misspecification, it can neither be interpreted as the predictive accuracy nor the explained variation of the least false model but is rather an estimator of a quantity in between these two measures.

We have compared the three available estimation procedures available for survival data in simulation studies: The estimated explained variation, the estimator based on Graf et al. and the estimator based on Schemper and Henderson. We studied the exponential failure time model using survival status as the variable of interest and a quadratic loss function. We first considered the case where the model is not misspecified in order to study the efficiency. Here the estimator of Graf et al. turned out to be the least efficient whereas the estimated explained variation wass the most efficient estimator of the explained variation. Misspecifying the model (by leaving out covariates and still using an exponential model), we did not succed in finding examples where the quantities estimated by the three estimators differed appreciably.

Was this article helpful?

## Post a comment