MAE m1W W

i=i where W* = log(T*) and W* = ¡3 Z* + S-1(0.5), i = 1, ■■■ ,m were the true and predicted responses, respectively. Note that <S'e(0.5) was the median of the Kaplan-Meier estimate of the survival function for the error term and

{(T*, Z*)} gave a set of true responses and covariate vectors for m future subjects.

Number Mean Squared Prediction Error of Covariate Effects of correlation p = 0 correlation p = 0.3

covariates BJa Optimal6 Dominant CVd BJ Optimal Dominant CV

Number Mean Squared Prediction Error of Covariate Effects of correlation p = 0 correlation p = 0.3

covariates BJa Optimal6 Dominant CVd BJ Optimal Dominant CV

P

10

1.1

1

1.1

10

1.7

2

1.6

10

MSE

0.110

0.062

0.062

0.065

0.132

0.067

0.074

0.086

(SE)

(0.008)

(0.005)

(0.005)

(0.005)

(0.009)

(0.005)

(0.005)

(0.006)

P

25

1.9

1

1.2

25

2.3

2

1.7

25

MSE

0.431

0.190

0.207

0.209

0.493

0.202

0.219

0.264

(SE)

(0.029)

(0.007)

(0.006)

(0.008)

(0.026)

(0.009)

(0.010)

(0.015)

P

40

1.8

2

1.4

40

2

2

1.3

40

MSE

3.088

0.344

0.360

0.378

2.897

0.306

0.324

0.363

(SE)

(0.355)

(0.013)

(0.015)

(0.013)

(0.273)

(0.008)

(0.011)

(0.006)

P

1.7

2

1.4

2.08

2

1.9

50

MSE

N/A

0.344

0.368

0.417

N/A

0.240

0.243

0.336

(SE)

(0.009)

(0.020)

(0.033)

(0.008)

(0.016)

(0.020)

P

2.9

1

3.4

2.2

1

2.4

100

MSE

N/A

0.914

0.987

1.053

N/A

0.599

0.628

0.706

(SE)

(0.017)

(0.027)

(0.026)

(0.007)

(0.015)

(0.029)

a The Buckley-James algorithm.

b The optimal number of latent variables used at each run. c The same number of latent variables used for all runs. d The cross-validated number of latent variables used at each run.

Table 2. Comparison of mean squared prediction error of covariate effects from the Buckley-James algorithm and partial least squares given n = 50 and approximately 20% censoring, assuming a normal error distribution.

We constructed the mean absolute prediction error of responses over a sample of size m = 100 with different covariate numbers (p = 25, 40, 50, 100) and different correlations in the covariate space (p = 0, 0.3) for extreme value (Table 3) or normal (Table 4) error distribution of variance a2 = 0.2, 0.4, 0.6. The responses were predicted in two ways. The first method assumed that the true covariate effects f3'0Z*, i = 1, ■ ■ ■ ,m were known and the predicted responses were computed by W* = 3'oZ* + S^£1(0.5), i = 1,■ ■ ■ ,m. The estimated prediction error was thus due to the estimation of the median of the error distribution and the variation of the future subjects and not errors in estimating the regression coefficients. The other method estimated covariate effects using the partial least squares with the cross-validated number of latent variables, and estimated the median of the error term from the empirical error distribution. The predicted responses from the PLS prediction were computed by W* = ¡3 Z*+S-1(0.5), i = 1, • • • ,m. The first method would of course not be available to a data analyst in practice, and was used simply for comparison purpose.

For both error distributions, when the percentage of censoring was small (20%), partial least squares appeared to give an accurate prediction of the response for a future subject with a set of covariates. The prediction was better when the covariates were moderately correlated and when the variance of the error distribution was moderately small. Increasing the number of parameters reduced the accuracy of the response prediction. The optimal mean absolute mean error stayed around 0.36.

Number of Error Variance Mean Absolute Predicted Error of Responses

Covariates

correlation

P = o

correlation p = 0.3

Method I Method II

Method I

Method II

0.2

0.32

0.39

0.33

0.39

10

0.4

0.45

0.51

0.44

0.52

0.6

0.56

0.60

0.55

0.62

0.2

0.31

0.47

0.31

0.52

25

0.4

0.46

0.64

0.44

0.52

0.6

0.53

0.75

0.53

0.75

0.2

0.32

0.56

0.31

0.66

40

0.4

0.40

0.64

0.44

0.67

0.6

0.55

0.81

0.53

0.76

0.2

0.32

0.69

0.30

0.71

50

0.4

0.47

0.68

0.44

0.68

0.6

0.51

0.86

0.51

0.74

0.2

0.32

0.87

0.31

0.83

100

0.4

0.43

0.95

0.46

0.86

0.6

0.53

1.19

0.55

0.96

Table 3. Comparison of the mean absolute prediction error of responses from methods I and II, assuming an extreme value error distribution.

Table 3. Comparison of the mean absolute prediction error of responses from methods I and II, assuming an extreme value error distribution.

The next section describes the data set from ACTG 333 in more detail and presents an analysis of that data.

Was this article helpful?

0 0

Post a comment