PChW log pW hjx2udu6

\ w where L^ is the partial log-likelihood (in the sense of section 2.3) and h is a positive smoothing parameter which controls the tradeoff between the fit of the data and the smoothness of the function. Maximization of (6) over the desired class of functions defines the maximum penalized likelihood estimator (MPLE) X^. The solution is then approximated on a basis of splines. The main advantage of the penalized likelihood approach over the kernel smoothing method is that there is no edge problem; the drawback is that it is more computationally demanding. The method of likelihood cross-validation (LCV) may be used to select h. To circumvent the computational burden of the LCV a one-step Newton-Raphson expansion has been proposed by O'Sullivan [O'S88] and adapted by Joly et al. [JCL98]; we denote this approximation by LCVa.

ELLbboot and ELLiboot are also applicable to select the smoothing parameter for penalized likelihood estimators. Figure 1 displays the penalized likelihood estimate chosen by ELLbboot and the true hazard function for one simulated example. We have compared LCVa, LCV, ELLbboot and ELLiboot to ELL in a short simulation study (penalized likelihood estimators require more computation than kernel estimators). We used the sample with size n = 50, generated in section 4.1. The results of the simulation are summarized in table 4. For penalized likelihood estimators, the differences were small between LCV, LCVa and ELLbbooi; ELLiboot seemed to be less satisfactory.

Table 4. Average Kullback-Leibler information —KL(AW) for penalized likelihood estimator for each criterion. Standard errors are given in parentheses.

—kl('aW )

n=50 and

15% censoring











n=50 and

25% censoring











n=50 and

50% censoring











5 Choosing between stratified and unstratified survival models

5.1 Method

The estimators of ELL can be used to choose between stratified and unstratified survival models. Consider right-censored data as defined in section 2.1 and let X = (Xi,..., Xn) a vector of binary variable (coded 0/1). Finally, we note W = (W1,..., Wn) with Wi = (Tj, 5i,Xi) the observed data. We propose to use the ELLbboot or the LCVa criteria, to choose between a proportional hazards model and a stratified model. We define by

A(t\Xi) = A0(t)exp f3Xi i = 1, the proportional hazards model ([COX72]) and by

the stratified model. To estimate these two models, we may use the penalized likelihood approach. In the proportional hazards regression model, Ah( • ) and

A maximize the penalized log-likelihood:

In the stratified model, Ah( • ) and Ah(•) maximize:

= log Cx (W0) - h A0 (u)du + logCx (Wx) - h A1 (u)du where W0 = (W?,.. .,W0o) with W° = (Ti,5i,Xi = 0) and W1 = (Wl,...,Wn11) with W1 = (Ti,5i,Xi = 1). We can remark that, we do not estimate separately A0(•) and A1( • ) on the sample W° and W1. A0( • ) and A1( • ) are estimated using the same smoothing parameter; thus the family of estimators \h ( • \ •) of the proportional hazards model and the family of estimator Ah(• \•) of the stratified model have both just one hyper-parameter h. Therefore, we can discriminate between these two models (we return on this theoretical issue in the discussion). The LCVa criterion could be applied to select h in the two models and thus to choose between them. It is appealing to apply in addition to the condition of the remark of section 3.3, the stronger condition £ Xi = «4. This has the advantage on conditioning on an ancillary statistic (the sample sizes in the strata, which does not carry information) and to yield the addition formula (7) below. The conditional criterion is thus:

where W' = W ,W' = (W',..., W'n) with W' = (f ,S'i,X'). To calculate ELLc(Ah) we use ELLbboot defined in (5) with each bootstrap sample j that satisfies the condition £rn=1 Xj = n1. For the stratified estimator, we note that:

So, in practice for each h we estimated ELL(Ah) and ELL(Ah) by (5) applied separately to W0 and W1 then computed ELLc(Ah) by (7). To minimize the different selection criteria we use a golden section search.

5.2 Example

We analysed data from the Paquid study [LCDBG94], a prospective cohort study of mental and physical aging that evaluates social environment and healh status. The Paquid study is based on a large cohort randomly selected in a population of subjects aged 65 years or more, living at home in two departments of southwest France (Gironde and Dordogne). There were 3675 non demented subjects at entry in the cohort and each subject has been visited six times or less, between 1988 and 2000; 431 incident cases of dementia were observed during the folow up. The risk of developing dementia was modeled as a function of age. As prevalent cases of dementia were excluded, data were left-truncated and the truncation variable was the age at entry in the cohort (for more details see Commenges et al., [CLJ+98]). Two explanatory variables were considered: sex (noted S) and educational level (noted E). In the sample, there were 2133 women and 1542 men. Educational level was classified into two categories: no primary school diploma and primary school diploma [LGC+99]. The pattern of observations involved interval censoring and left truncation. It is straightfoward to extent the theory described above to that case. For the sake of simplicity, we kept here the survival data framework, treating death as censoring rather than the more adapted multistate framework (Commenges ,2002). We were first interested in the effect of sex. The penalized likelihood estimate was used to compare the risk of dementia for men and women with a stratified model (model A) (figure 2) using ELL66OQi for choosing the smoothing parameter.

The penalized likelihood estimate using the LCVa criterion was very close to the one obtained with ELL66OQi. It appears that women tend to have a lower risk of dementia than men before 78 years and a higher risk above that age and shows a non proportional hazard model. Indeed the proportional hazards model (model B) had lower value for both LCVa and ELL66OQi than the stratified model (table 5).

Another important risk factor for dementia is educational level. As the proportional hazards assumption does not hold, we performed several analyses on the educational level stratified on sex. We considered three models. The stratified proportional hazards model (model C):

the proportional hazard model performed separately (model D):

Ah (t)exp 3oEi if Si = 0 (women) Ah (t)exp 31Ei if Si = l(men)

the model stratified on both sex and educational level (model E):

Fig. 2. Estimates of the hazard function of dementia for male (solid line) and female (dotted line) chosen by ELLbboot criterion.

Table 5 presents the results of the different models. The two criteria give the same conclusion: the best model is the stratified proportional hazard model (highest values; model C). Subjects with no primary school diploma have an increased risk of dementia. For this model (model C), the estimated relative risk for educational level is equal to 1.97; the corresponding 95% confidence interval is [1.63; 2.37].

6 Conclusion

We have presented a general criterion for selection of semi-parametric models from incomplete observations. This theoretical criterion, the expectation of the observed log-likelihood (ELL) performs nearly as well as the optimal KL distance (which is very difficult to estimate in this setting) as soon as there is enough information. We have shown that LCV estimates ELL. LCV and two proposed bootstrap estimators yield nearly equivalent results; ELLbboot seems the best bootstrap estimator. The approximate version of LCV (for

Table 5. Comparison of the stratified and proportional hazards models according ELLbboot and LCVa criterion; A and B: unstratified and stratified models on sex; C, D, E: 3 models stratified on sex with educational level as new covariable (see text).

_ELLbboot LCVq model A -1515.61 -1517.45 model B -1517.71 -1519.92 model C -1492.61 -1496.28 model D -1493.51 -1497.18 model E -1495.48 -1498.42

penalized likelihood) also performs very well and thus appears as the method of choice for this problem, due to the short computation time it requires. When no approximation of LCV is available, bootstrap estimators such as ELLbboot are competitive because the amount of computation can be more flexibly tuned than for LCV.

ELL can be used for choosing a model in semi-parametric families. An important example is the choice beetween stratified and unstratified survival models. We have shown that this could be done using LCV or a bootstrap estimator of ELL in the case where all the models are indexed by a single hyper-parameter. This raises a completely new problem which is how to compare families of models of different complexities, i.e indexed by a different number of hyper-parameters. For instance this problem would arise if we compared a proportionnal hazards model (1 hyper-parameter) to a stratified model with one hyper-parameter for each stratum. We conjecture that there is a principle of parsimony at the hyper-parameter level, similar to that known for the ordinary parameters.


[ABGK93] P. K. Andersen, R.D. Borgan, R.D. Gill, and D. Keiding. Statistical models based on counting processes. Springer-Verlag, New-York, 1993.

[Aka73] H. Akaike. Information theory and an extension of the maximum likelihood principle. In B.N. Petrov and F. Csaki, editors, Second International Symposium on Information Theory, pages 267-281, Budapest, 1973. Akademiai kiado.

[Aka74] H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19:716-723, 1974.

[CLJ+98] D. Commenges, L. Letenneur, P. Joly, A. Alioum, and J.F. Dar-tigues. Modelling age-specific risk: application to dementia. Statistics in Medicine, 17:1973-1988, 1998.



[COX72 [CS98



[Fer99 [Hal92 [HST98

[HT89 [HT90 [ISK97


[LC04 [LCDBG94

D. Commenges. Inference for multistate models from interval-censored data. Statistical Methods in Medical Research, 11:1—16, 2002.

J. B. Copas. Regression, prediction and shrinkage (with discussion). Journal of the Royal Statistical Society B, 45:311-354, 1983.

D.R. Cox. Regression models and life tables (with discussion). Journal Royal Statistical Society B, 34:187-220, 1972. J. E. Cavanaugh and R. H. Shumway. An Akaike information criterion for model selection in the presence of incomplete data. Journal of Statistical Planning and Inference, 67:45-65, 1998. P. Craven and G. Wahba. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Num. Math., 31:377-403, 1979.

J. DeLeeuw. Breakthroughs in statistics, volume 1, chapter Introduction to Akaike (1973) information theory and an extension of the maximum likelihood principle, pages 599-609. SpringerVerlag, London, 1992. Kotz, S. and Johnson, N. L. J. D. Fermanian. A new bandwitdth selector in hazard estimation. Nonparametric Statistics, 10:137-182, 1999. P. Hall. The bootstrap and Edgeworth expansion. SpringerVerlag, New york, 1992.

C. M. Hurvich, J.S. Simonoff, and C.L Tsai. Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society B, 60:271-293, 1998.

C. M. Hurvich and C.L Tsai. Regression and time series model selection in small samples. Biometrika, 76:297-307, 1989. T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. Chapman and Hall, London, 1990.

M. Ishiguro, Y. Sakamoto, and G. Kitagawa. Bootstrapping log likelihood and EIC, an extension of AIC. Ann. Inst. Statist. Math, 49:411-434, 1997.

P. Joly, D. Commenges, and L. Letenneur. A penalized likelihood approach for arbitrarily censored and truncated data: application to age-specific incidence of dementia. Biometrics, 54:185194, 1998.

B. Liquet and D. Commenges. Estimating the expectation of the log-likelihood with censored data for estimator selection. LIDA, 10:351-367, 2004.

L. Letenneur, D. Commenges, J.F. Dartigues, and P. Barberger-Gateau. Incidence of dementia and alzheimer's disease in elderly community residents of south-western france. Int. J. Epidemiol., 23:1256-1261, 1994.

[LGC+99] L. Letenneur, V. Gilleron, D. Commenges, C. Helmer, J.M. Or-gogozo, and J.F. Dartigues. Are sex and educational level independent predictors of dementia and alzheimer's disease? incidence data from the paquid project. J. Neurol. Neurosurg. Psychiatry., 66:177-183, 1999.

[LSC03] B. Liquet, C. Sakarovitch, and D. Commenges. Bootstrap choice of estimators in non-parametric families: an extension of EIC. Biometrics, 59:172-178, 2003.

[Mal73] C.L. Mallows. Some comments on Cp. Technometrics, 15:661675, 1973.

[Mil02] A.J. Miller. Subset Selection in Regression (Second Edition). Chapman and Hall, London, 2002.

[MP87] J. S. Marron and W. J. Padgett. Asymptotically optimal bandwidth selection for kernel density estimators from randomly right-censored samples. The Annals of Statistics, 15:1520-1535, 1987.

[O'S88] F. O'Sullivan. Fast computation of fully automated log-density and log-hazard estimators. SIAM J. Sci. Stat. Comput., 9:363379, 1988.

[RH83] H. Ramlau-Hansen. Smoothing counting process intensities by means of kernel functions. The Annals of Statistics, 11:453-466, 1983.

[Sil86] B.W. Silverman. Density estimation for statistics and data analysis. Chapman and Hall, London, 1986.

Was this article helpful?

0 0

Post a comment