## Appendix

The following result on consistency is easily proved using Lemma 2.8 and Lemma 2.13 of Pakes and Pollard [PP89].

Let (Zi, Yi), ■ ■ ■ , (Zn, Yn) be a sample of n independent random variables with the same distribution as the random variable (Z, Y) taking values in R9 x Rp.

Theorem 1. Assume 9n ^ 9 G O when n tends to infinity, the convergence being almost sure or in probability. Let {hg : R9 x Rp ^ R | 9 G O} be a family of functions with E\hg(Z,Y)| < to for all 9 belonging to a bounded neighborhood of 9 (E denoting expectation with respect to the true distribution of (Z,Y)).

Assume further that there exists an a > 0 and a nonnegative function : R9 x rp ^ R with Ep(Z, Y) < to for which

I hg (z,y) - hg, (z,y)l<v(z,y)\\9 - 9'\\a for some norm \\ ■ \\ on O, all (z, y) and all 9, 9' belonging to the bounded neighborhood of 9. Then

Explained Variation and Predictive Accuracy 403 1 n

i=i for n ^ x, ^ being the same convergence as above.

Note that the conditions are fulfilled if the functions 0 ^ ho (z,y) are continously differentiable for every (z,y), ho and the derivatives of ho with respect to 0 are integrable with respect to the true distribution of (z,y) in a bounded neighbourhood of 0. This will usually be the case for quadratic and entropy loss.

According to the theorem, the numerator of the estimated explained variation (2) is a consistent estimator of no0 if 0n converges to 0o in probability or almost surely and the functions 0 ^ E o(L(V,vo(Z)) | Z = z) fulfill the prescribed conditions for all z and 0 in a bounded neighborhood of 0o. Similarly the numerator of the explained residual variation (3) is a consistent estimator of no0 if 0n converges to 0o in probability or almost surely and the functions 0 ^ L(v,Vo(z)) = L(f (y),Vo(z)) fulfill the conditions for all (z,y) and 0 in a bounded neighborhood of 0o.

The theorem cannot be applied directly to the denominators of the two estimators to ensure that these are consistent estimators of the marginal prediction error n°g. The marginal prediction rule (z ^ V°g) is usually a simple function of the observed values (Zi,Yi),i = 1,...,n, rather than a function of the estimated parameter 0n. That is, V0 = g((Zl,Yl), • • • , (Zn,Yn)) for some function g : R2n ^ R. If for example quadratic or entropy loss is used, g determines the average of Vi = f (Yi), i = 1,...,n.

The denominator of the estimated explained variation (2) cannot typically be written as an average but is often a simple function of the marginal prediction V9 (see e.g. Korn and Simon [KS91] for some examples). Since it is often possible to use Theorem 1 or even simpler methods (for example the law of large numbers) to guarantee that V9 is a consistent estimator of V°g, the consistency of the denominator can be obtained.

The denominator of the explained residual variation (3) has the form Zn=iL(Vi,V°J = Zn=iL(Vi ,g((Zi ,Yl),. • • , (Zn,Yn))) and hence does not have a form as the average in (4). It is however often possible to rewrite the denominator into a form for which it is possible to use Theorem 1 or simpler methods to guarantee the consistency.