## Nonparametric Estimation and Testing in Survival Models

Henning Lauter1 and Hannelore Liero2

1 Institute of Mathematics, University of Potsdam [email protected]

2 Institute of Mathematics, University of Potsdam [email protected]

The aim of this paper is to demonstrate that nonparametric smoothing methods for estimating functions can be an useful tool in the analysis of life time data. After stating some basic notations we will present a data example. Applying standard parametric methods to these data we will see that this approach fails - basic features of the underlying functions are not reflected by their estimates. Our proposal is to use nonparametric estimation methods. These methods are explained in section 2. Nonparametric approaches are better in the sense that they are more flexible, and misspecifications of the model are avoided. But, parametric models have the advantage that the parameters can be interpreted. So, finally, we will formulate a test procedure to check whether a parametric or a nonparametric model is appropriate.

### 1 Stating the Problem

We consider life or failure times of individuals or objects belonging to a certain group, the so-called population of interest. Examples are: survival times of patients in a clinical trial, lifetimes of machine components in industrial reliability or times taken by subjects to complete specified tasks in psychological tests. We assume that these life times can be modelled by a random variable Y with a distribution F, that is, we assume that the probability that an individual of the underlying population dies (fails) before time point t can be expressed in the form

The probability that the individual survives the time point t is given by the survival function

Other functions of interest are the density f (t) = F'(t) and the hazard or failure rate

s|0 s describing the immediate risk attaching to an individual known to be alive at time point t.

Now, suppose that we have obtained data from the underlying population. How we can use these data to estimate the survival function or the hazard rate?

Assuming a parametric model for the distribution the survival times we have to estimate parameters. It is well-known, that the maximum likelihood method provides good estimates.

For example, if we assume that our data are realizations of exponential distributed random variables Yi,...,Yn, that is, the survival function is given by

S(t) = exp(-tfj), with parameter f3 > 0, then the problem of estimating the function S is simply the problem of estimating the parameter . And the maximum likelihood estimator (m.l.e.) is given by

Assuming a Weibull distribution with parameters f3 and v, i.e

S (t) = P(Y>t) = exp(-(t/f3)v), we obtain that the m.l.e. of the two-dimensional parameter is a solution of

En=i Yj log Yi