Summary and Discussion

Partial least squares algorithm has been extended to the proportional hazards model to analyze right censored data [NR02, PT02]. Though proportional hazards model is very popular to analyze right censored survival data, the accelerated failure time model is more interpretable under certain circumstances. The approach described in this paper extends principal component

latent variable

Fig. 5. Scatter plot of change in logio RNA versus the BJ-PLS latent variable (censored responses are replaced by imputed values)

latent variable

Fig. 5. Scatter plot of change in logio RNA versus the BJ-PLS latent variable (censored responses are replaced by imputed values)

regression (PCR) and partial least squares (PLS) to the accelerated failure time model(AFT), and compares the exploratory analysis of an HIV data set using these methods to more traditional stepwise regression. Even in the simplest setting of linear regression, model selection and prediction can present difficulties, and those difficulties are amplified in the presence of censored data. Nonparametric estimates of a mean response with censored data are well-known to be biased and estimating the intercept in the AFT model presents the same issues. We have chosen to absorb the intercept as an un-modeled term in the error distribution, treating the covariate effect ft'Zi as the main quantity of interest. Because of the unknown intercept and the incomplete observation of censored responses, we use the leave-two-out cross validation described in section 2, which relies only on predicted covariate effect for cases dropped from the training data set, instead of the usual prediction error sum of squares. The leave-two-out cross-validation suits the primary objective of the analysis, i.e., grouping subjects according to prognosis. In such an analysis, the error in minimizing the difference in response between two subjects should be minimized.

Principal component analysis performed poorly with this data set. The empirical experience with principal component analysis [WM03] suggests that

latent variable

Fig. 6. Residuals from the least squares fit of the response variable on the BJ-PLS latent variable (censored residuals are replaced by imputed values)

latent variable

Fig. 6. Residuals from the least squares fit of the response variable on the BJ-PLS latent variable (censored residuals are replaced by imputed values)

it often leads to a larger number of latent variables than partial least squares to achieve the same prediction error, so it is possible that more than 7 principal components were necessary in this data set. We are reporting elsewhere the results of detailed simulations comparing PCR and PLS. Those simulations also show that PCR in the AFT with censored data also leads to more latent variables when the number of latent variables is chosen by cross-validation.

The proposed BJ-PLS method takes advantage of the fact that every iterative step of Buckley-James algorithm is an ordinary least squares fit and replaces the regular least squares fitting with the PLS fitting. Since the major computational burden of BJ-PLS is on the PLS algorithms performed at each iteration step, it is expected that the BJ-PLS shares similar scalability of PLS, which is known to be numerically adaptive to high-dimensional data sets.

Estimated Beneficial Scores

Estimated Beneficial Scores

Fig. 7. Density estimate of ┬žbeneficial scoresT ft'Zi estimated from BJ-PLS ACKNOWLEDGMENTS

This work was supported by the grants AI24643 and AI58217 awarded by the National Institute of Allergy and Infectious Diseases, NIH. The data set from protocol 333 was graciously provided by the Statistics and Data Analysis Center (SDAC) of the AIDS Clinical Trials Group (ACTG).

Fig. 7. Density estimate of ┬žbeneficial scoresT ft'Zi estimated from BJ-PLS ACKNOWLEDGMENTS

This work was supported by the grants AI24643 and AI58217 awarded by the National Institute of Allergy and Infectious Diseases, NIH. The data set from protocol 333 was graciously provided by the Statistics and Data Analysis Center (SDAC) of the AIDS Clinical Trials Group (ACTG).

Was this article helpful?

0 0

Post a comment