The function a is a known weight function, it is introduced to control the region of integration.
To formulate a test based on this statistic we have to derive the distribution of Qn, or at least the limiting distribution under the hypothesis. The theory about the asymptotical distributional behavior of quadratic forms yields the following limit statement. Under
- regularity conditions on the kernel K and the bandwidth bn,
- smoothness of the functions Hand Hu
- conditions on the function a such that the integrals given below exist and
- conditions ensuring that the estimator "&n is ^/n-consistent the distribution of the standardized Qn converges to the standard normal distribution, that is nb1/2 T)
with k1 = J K2(x) dx and k2 = f (K * K)2(x) dx and "*" denotes the convolution.
The only unknown term in this limit statement is the distribution H of the observations. Replacing this by the empirical distribution Hn we obtain the following asymptotic a-test: Reject H, iff
Here za is the (1 — a)-quantile of the standard normal distribution and jln and &n are defined as in (8), where H is replaced by Hn. Note that one has to choose the function a such that regions where the kernel estimator of the hazard rate has a large variance are excluded.
Now, let us apply the proposed test to the example considered in Section 1. The nonparametric estimator of the hazard rate in the Weibull mixture model and the smoothed hypothetical hazard function, that is a hazard rate in a Weibull model with parameter ti = (1.057,1.422), are given in Figure 4. We compute the integrated quadratic distance over the interval [0, 4]. and get the following values for the test statistic and the standardizing terms
With these values the test procedure yields for a = 0.05: Reject H. The p-value is 0.0025.
1. There are two possible points of view. The first is to consider the minor part of the mixture as a disturbation. That is, one is interested in the main part, for which the parametric model is justified. Then the nonparametric estimate of the hazard rate shows that the population is not homogenous, or in other words, our data are not appropriate for the estimation of both parameters. Further, we see that the hazard rate reflects this deviation
Fig. 4. (a) Densities, (b) Hazard rates. Hypothetical single Weibull model (dashed line), nonparametric estimate (bold solid line), in (b) true underlying mixture model (thin solid line)
much better then the survival function. Hence, in this case the application of a nonparametric estimator for the hazard rate is helpful for detecting outliers.
2. A second point of view is, that one is interested in the distribution of the population, that is the data are correct in the sense, that they are represent the population we are interested in. Then our nonparametric approach shows that the chosen parametric model is not appropriate. Thus, the nonparametric estimator can be helpful for stating a better parametric model. Of course a parametric mixture model with unknown parameter p is a complicated matter.
3. In both cases we see that the hazard rate is more sensitive. The deviation of a hazard rate from a hypothetical one, which can be seen very clearly, is smoothed away when we consider the corresponding survival functions.
1. The proposed test is consistent, that is, if the distribution of the data does not belong to the hypothetical class, then the probability that the test rejects the hypothesis tends to one. This is not a very strong property. So, it seems to be useful to consider the power of the test under so-called local alternatives. For testing a density function nonparametrically such considerations were done in [LLK98]. The results for the hazard rate are similar. Roughly speaking one obtains, that the test is sensitive against alternatives tending to the hypothetical hazard function at the rate \Jnbt/2.
2. The problem of the application of the nonparametric estimator and the test is the choice of the bandwidth bn. If the bandwidth is chosen large, the systematic error becomes large. At the first view this is not crucial, because we compare the smooth nonparametric estimator Xn with the smoothed hypothetical function An. But the approximation of the distribution of the standardized test statistic Qn by the normal distribution is worse for large bn. Simulation results show that in this case the test has the tendency to accept the hypothesis. At the other hand, if bn is chosen to small, then the resulting estimator is wiggly, and the power of the becomes worse.
The approach described above can be generalized to the model with covariates. In applications often we observe in addition to the life times some covariates. These covariates can be e.g. the dosis of a drug, the temperature or other factors of influence. That is, we have observations (Ti,Xi,Si), where Xi is the covariate taking values in R or more general in Rk. We can consider these covariates as fixed design points, or as random values. In both cases we are interested in statistical inference about the survival function S(t\x), the density f (t\x) = — dS<dttx) and the hazard function X(t\x) = S(t|X). Here S(t\x) is the probability that an individuum or item survives the time point t given the covariate takes the value x. We do not want to go into further details, the basic idea is to estimate the distribution functions H( • \x) and H U (• \x) not by the emprirical distribution functions given in (4), but by weighted empirical distribution functions nn
HU (t) =E Wni(X, x; hn )l(Ti < t,Si = 1) Hn(t) = ]T Wni(X, x; hn )1(T < t)-
Here, the weights wnj (X,x) depend on the observed covariates X = (Xi,...,Xn), on x and on a smoothing parameter hn. We assume wni(X, x; hn) = 1. They are chosen such that the Tj gets a large weight in counting all the Ti's, which are smaller or equal t, if the corresponding covariate Xj is near x. Appropriate weights are kernel weights of Gasser-Muller type for fixed covariates or Nadaraya-Watson kernel weights for random Xi's. The resulting estimator of the hazard rate has then the following form
Properties of nonparametric estimators for the hazard rate, the cumulative hazard function and the survival functions for models with covariates are derived, for example, in papers [GMCS96] and [VKVN97], [VKVN01], [VKVN02].
For testing the hypothesis that X(■ \x) is equal to a given hazard function X*(■ \x) we propose (for fixed covariates) the following test statistic
Here An( ■ \xk) is the smoothed hypothetical hazard function at fixed covariate xk. In [L03a] it is shown that under certain conditions on K, bn, the weights wni and hn and on the smoothness of the underlying distribution functions that the (appropriate standardized) Sn is asymptotically normally distributed. Based on this limit statement a test procedure can be derived. Moreover, for testing the hypothesis, that X(■ \x) lies in a prespecified parametric class a test statistic with estimated parameters can be applied.
Appendix: Formulation of the Limit Theorem
This theorem is formulated not only for the behavior under the null hypothesis, but for general hazard rate A. We define
Further, let Th be the right end point of the distribution H. Theorem 1. Suppose that
(i) K is a continuous density function vanishing outside the interval [— L,L] for some L > 0.
(ii) A and H are Lipschitz continuous.
(iii) The function a is continuous and a(t) = 0 for all t > TH and the integrals defined below are finite.
Hn = (nbn) J 1 _ H (t) a(t)dtK! a =2 J ^ - H (t^ a (t)dtK2 The proof of this theorem is given in [L03b].
[VKVN02 [VKVN02 [VKVN02 [VKVN02
Beran, R.: Nonparametric regression with randomly censored survival times. Technical Report, Univ. California, Berkeley (1981) Bagdonavicius, V. and Nikulin, M.: Accelerated Life Models; Modeling and Statistical analysis. Boca Raton.; Chapman and Hall /CRC (2002)
Diehl, S. and Stute, W.: Kernel density and hazard function estimation in the presence of censoring. J. Multivariate Anal., 25, 299-310 (1988).
Gonzalez-Manteiga, W. and Cadarso-Suarez, C.: Asymptotic properties of a generalized Kaplan-Meier estimator with some applications. J. Nonparametic Statistics, 4, 65-78 (1996). Liero, H. and Läuter, H. and Konakov, V. D.: Nonparametric versus parametric goodness of fit. Statistics, 31, 115-149 (1998). Liero, H.: Goodness of fit tests of L2-type. In: Statistical Inference for Semiparametric Models and Applications, Ed. Nikulin, M., Publisher Birkhäuser (2003a).
Liero, H.: Testing the hazard rate. Preprint, Institut für Mathematik, Universität Potsdam. (2003b).
Lo, S.-H. and Singh, K.: The product-limit estimator and the bootstrap: Some asymptotic representations. Probab. Theory Related Fields, 71, 455-465 (1986).
Major, P. and Rejto: Strong embedding of the estimator of the distribution function under random censorship. Ann. Statist., 16, 1113 - 1132 (1988).
Singpurwalla, N. D. and Wong, M. Y.: Estimation of the failure rate - a survey of nonparametric methods, part I: Non- Baysian methods. Commun. Statist.- Theory and Meth., 12, 559-588 (1983).
Tanner, M. A. and Wong, W. H.: The estimation of the hazard function from randomly censored data by the kernel method. Ann. Statist., 11, 989-993 (1983).
Van Keilegom, I. and Veraverbeke, N.: Estimation and bootstrap with censored data in fixed design nonparametric regression. Ann. Inst. Statist. Math., 49, 467-401 (1997).
Van Keilegom, I. and Veraverbeke, N.: Hazard rate estimation in nonparametric regression with censored data. Ann. Inst. Statist. Math., 53, 730-745 (2001).
Van Keilegom, I. and Veraverbeke, N.: Density and hazard estimation in censored regression models. Bernoulli, 8, 607-625 (2002).
Was this article helpful?