When uncoarsened data are considered, there are two available estimators of the explained variation. Traditionally, the explained residual variation is preferred due to the nice interpretation as an estimator of predictive accuracy in the case of model misspecification. However, we have not been able to demonstrate considerable differences in the quantities estimated by the two estimators and therefore we do not consider the question of misspecification as a big problem as is the case in part of the literature on this area.
When considering survival data, the explained residual variation is undefined due to censoring and therefore other estimation procedures have been proposed by Graf et al. [GSSS99] and Schemper and Henderson [SH00]. When choosing one of the estimators in preference to the other two, the efficiency may be compared to the potential bias of the estimators in case of model misspecification. Since the model based estimators have a higher efficiency and the bias of these estimators is not necessarily large in case of model misspecification, we are tempted to prefer the model based approaches. Another criterion might also influence the choice of estimator: The estimated explained variation is, at least when quadratic loss is used, very simple to determine. The estimator based on the method of Graf et al. is also rather simple to calculate whereas the calculation of the estimator based on the method of Schemper and Henderson requires a bit more programming.
We have studied the issue of misspecification by several examples. It is possible however, that there exist examples for which the bias in misspecified models is more pronounced than for our examples.
Was this article helpful?