# Forward and Backward Recurrence Times and Length Biased Sampling Age Specific Models

Marvin Zelen1

Harvard School of Public Health and the Dana-Farber Cancer Institute Boston, MA 02115, U.S.A. [email protected]

Summary. Consider a chronic disease process which is beginning to be observed at a point in chronological time. The backward recurrence and forward recurrence times are defined for prevalent cases as the time with disease and the time to leave the disease state respectively, where the reference point is the point in time at which the disease process is being observed. In this setting the incidence of disease affects the recurrence time distributions. In addition, the survival of prevalent cases will tend to be greater than the population with disease due to length biased sampling. A similar problem arises in models for the early detection of disease. In this case the backward recurrence time is how long an individual has had disease before detection and the forward recurrence time is the time gained by early diagnosis; i.e. until the disease becomes clinical by exhibiting signs or symptoms. In these examples the incidence of disease may be age related resulting in a non-stationary process . The resulting recurrence time distributions are derived as well as some generalization of length-biased sampling.

### 1 Introduction

Consider a sequence of events occuring over time in which the probability distribution between events is stationary. Consider a randomly chosen interval having endpoints which are events and select at random a time point in the interval. The forward recurrence time is defined as the time from the random time point to the next event; the backward recurrence time is the time from the time point to the previous event; cf. Cox and Miller [CM65].

An example illustrating these recurrence times is the so-called "waiting time paradox"; cf. Feller [FEL71]. Suppose the events are defined as bus arrivals at a particular location. A person arriving at the bus stop has a waiting time until the next bus arrives. The waiting time is the forward recurrence time. The backward recurrence time is how long the person missed the previous bus.

Backward and forward recurrence times play an important role in several biomedical applications. However in many instances the distribution of events may have a distribution which changes with time. Furthermore time may be chronological or age. In some applications it may be necessary to consider two time scales incorporating both chronological time and age.

In addition, a closely related topic is length biased sampling . Referring to the bus waiting problem, when the individual arrives at the bus stop, she is intersecting a time interval having endpoints consisting of the previous bus arrival and the next arrival. Implicitly these intervals are chosen so that the larger the interval, the greater the probability of selecting it. The selection phenomena is called length bias sampling.

We will consider two motivating examples for generalizing the recurrence time distributions and length biased sampling. One example deals with a model of the natural history of a chronic disease . The other example refers to modeling the early detection of disease . The mathematics of the examples are the same. However, they are both important in applications and we use both to motivate our investigation. This paper is organized as follows. Section 2 describes the two motivating examples and summarizes results for stationary processes. Section 3 develops the model for the chronic disease example; section 4 indicates the necessary changes for the early detection example. The paper concludes with a discussion in section 5.

2 Motivating Problems and Preliminary Results 2.1 Chronic Disease Modeling

Consider a population and a chronic disease such that at any point in time a person may be disease free (So), alive with disease (Sa) or may have died of the specific disease (Sd). The natural history of the disease will be S0 ^ Sa ^ Sd. The transitions So ^ Sa corresponds to the (point) incidence of the disease and Sa ^ Sd describes the (point) mortality.

Of course an individual may die of other causes or may be cured by treatment. Our interest is in disease specific mortality. Hence an individual who dies of other causes while in Sa is regarded as being censored for the particular disease. An individual who is cured of a disease will still be regarded as being in Sa and eventual death due to other causes will be viewed as a censored observation. This model is a progressive disease model and is especially applicable for many chronic diseases — especially some cancers, cardiovascular disease and diabetes.

Consider a study where at some point in time, say, to this population will be studied. At this point in time some individuals will be disease free (S0) while others will be alive with disease (Sa). Those in Sa are prevalent cases. The backward recurrence time is how long a prevalent case has had disease up to the time to. The forward recurrence time refers to the eventual time of death of the prevalent cases using to as the origin. The sum of the backward and forward recurrence times is the total survival of prevalent cases.

### 2.2 Early Detection Modeling

Consider a population in which at any point in time a person may be in one of three states: disease free (So), pre-clinical (Sp), or clinical (Sc). The preclinical state refers to individuals who have disease, but there are no signs or symptoms. The individual is unaware of having disease. The clinical state refers to the clinical diagnosis of the disease when the disease interferes with the functioning of an organ system or causes pain resulting in the individual seeking medical help leading to the clinical diagnosis of the disease. The natural history of the disease is assumed to be S0 ^ Sp ^ Sc. Note that the transition from So ^ Sp is never observed. The transition Sp ^ Sc describes the disease incidence. The aim of an early detection program is to diagnose individuals in the pre-clinical state using a special examination. If indeed, the early detection special examination does diagnose disease in the pre-clinical state, the disease will be treated and the natural history of the disease will be interrupted. As a result, the transition Sp ^ Sc will never be observed. The time gained by earlier diagnosis is the forward recurrence time and the time a person has been in the pre-clinical state before early diagnosis is the backward recurrence time. If to is the time (either age or chronological time) in which the disease is detected, we then have an almost identical model as the chronic disease model simply by renaming the states.

### 2.3 Preliminary Results

Consider a non-negative random variable T having the probability density function q(t). A length biased sampling process chooses units with a probability proportional to t (t < T < t + dt). Samples of T are drawn from a length biased process. Suppose the random variable is randomly split into two parts (U, V) so that T = U + V. The random variable U and V are the backward and forward recurrence times. The model assumes that for fixed T = t (t < T < t + dt) a point u is chosen according to a uniform distribution over the interval (0,t). Then if qf (v) and qb(u) are the probability density functions of the forward and backward recurrence times it is well known that with length biased sampling for selecting T; cf. Cox and Miller [CM65].

Jt Jo

Note that the first moments of these distributions are:

where C = a/m is the coefficient of variation associated with q(t). If q(t) is the exponential distribution with mean m, the forward and backward recurrence times have the same exponential distribution as q(t) and C = 1.

A reviewer suggested that a simpler way to discuss these results is to initially assume that the joint distribution of (U, V) is f (u, v) = q(u + v)I(u > 0,v > 0)/m.. Then all the results above are readily derived. Implication in this assumption is f (u/T) = 1/t and length biased sampling.

### 3 Development of the Chronic Disease Model

In this section we will investigate generalizations of the distribution of the backward and forward recurrence times using the chronic disease model as a motivating example. We remark that for the chronic disease model, the process may have been going on for a long time before being observed at time to.

Suppose at chronological time t0 the disease process is being observed. The prevalent cases at time to will have an age distribution denoted by b(z\to). We will initially consider the prevalent cases who have age z. Later by weighting by the age distribution for the whole population we will derive properties of the prevalent cases for the population. The prevalent cases could be regarded as conditional on the time to when observations began. Another model is that the prevalent cases could be assumed to have arisen by sampling the population at a random point in time which is to. We shall consider both situations.

P (z\to)= P {a(z\to) = 1}, Po = P {a(to) = 1} = P (z\to)b(z\to)dz(4)

Note that someone with disease at time to having age z was born in the year v = to — z. Hence the probability distribution of ages at time to is equivalent to the distribution of birth cohorts at time to.

Define

3.1 Forward Recurrence Time Distribution

Define

Tf = Forward recurrence time random variable qf (t\z)dt = P{t <Tf < t + dt \ a(z\to) = 1}

where r refers to the age of incidence. Consider the probability of being in Sa at time to and having age z. If an individual becomes incident at age r, then P{a(z\to) = 1\r} = P{T > z - r} = Q(z - r). Multiplying by I(r)dr and integrating over the possible values of r (0 < r < z) results in

This probability applies to the birth cohort year v = to - z; i.e. an individual born in year v who is prevalent at time to having age z.

Consider the joint distribution of an individual having age z at time to and staying in Sa for at least an additional t time units. If r is the age of entering Sa, then

P(z\to, r)Qf (t\z, r) = P{T > z - r + t} = Q(z - r +1)

and multiplying by I(r)dr and integrating over (0, z) gives

In the above it is assumed that the time entering Sa (r) is not known, requiring integration over possible values of (r). Consequently the p.d.f. of the forward recurrence time is d rz qf (t\z) = - JtQf (t\z) = ]0 I(r)q(z - r + t)dr/P(z\to) (7)

Suppose the incidence is constant, I(r) = I then qf (t\z) = [Q(t) - Q(t + z)]/ i Q(y)dy. (8)

If Q(z) is negligible, then qf (t\z) ~ Q(t)/m which is the usual forward recurrence time distribution for a stationary process.

Define qf (t\to) as the forward recurrence time averaged over the population. By definition we can write

P(a(t0) = l)qf (t\to) = f° P(z\to)qf (b\z)b(z\to)dz (9)

When the age distribution is uniform so that b(z\to) = b then it can be shown, cf. Zelen and Feinleib [ZF69]

/ qf (t\to)P (a\to) = l)dto/ P (a(to) = l)dto = Q(t)/m.

Thus if the sampling point is regarded as a random point in time, the forward recurrence time distribution as to ^tt is the same as the stationary forward recurrence time distribution.

### 3.2 Backward Recurrence Time Distribution

The backward recurrence time refers to the time in Sa up to time to (or age z). Let Tb be the backward recurrence time random variable and qb(t\z) be the conditional p.d.f. with Qb(t\z) = JZ qb(y\z)dy. Note that 0 <t < z. Then using the same reasoning as in deriving the forward recurrence time distribution we have

P{Tb >t,a(z\to) = l} = P(z\to)Qb(t\z) = I(t)Q(z - t)dr (10)

o which allows the calculation of qb(t\z); i.e., qb(t\z) = I(z - t)Q(t)/P(z\to), 0 <t < z (11)

Finally the average backward recurrence time distribution is qb(t\to) = Q(t) ft 11(z - t)b(z\to)dz/Po (12)

Note the distinction between qb(t\z) and qb(t\to). The former refers to individuals having age z at time to whereas the latter refers to the weighted average over age for prevalent cases at time to. When b(z\to) = b, we can integrate over to and show that the backward recurrence time averaged over to is Q(t)/m.

3.3 Length Biased Sampling and the Survival of Prevalent Cases

As pointed out earlier, the prevalence cases are not a random sample of cases, but represent a length biased sample. In this section, we investigate the consequences of length biased sampling when disease incidence is age-related. We also derive the survival of prevalent cases.

Define T = Tb + Tf which is the time in which prevalent cases are in Sa. This is the survival of prevalent cases from the time when they become incident with disease. We will derive f (t\z), the pdf of the time in Sa for prevalent cases who have age z at chronological time to. Since the age z is fixed at time to, it is necessary to consider t > z and t < z separately. If t is fixed and t > z, then P{a(z\to) = 1 \ t > z} = J0 I(t)dr. Similarly, if t is fixed and t < z, in order to be prevalent at time to and be of age z, it is necessary that z — t < t < z. Thus, we have for fixed t (t < T < t + dt)

fZ I (t )dT, if t>z P {a(z\to) = l \ t <T < t + dt} = { (13)

Note that pz-t I(t)dT is an increasing function of t. Consequently, individuals with long sojourn times in Sa have a greater probability of being in Sa at time to. Our development is a generalization of the usual considerations of length biased sampling as we have shown how length biased sampling is affected by the transition into Sa. The usual specification of length biased sampling is to assume P{a(z) = 1 \ t < T < t + dt} a t, which in our case would be true if I(t) = I and t < z. We also remark that P{a(z\to) = 0 \ t < T < t + dt} = /J I(t)dT refers to individuals, conditional on having survival t < T < t + dt, who entered Sa and died before time t0, but would have been age z at time t0 if they had lived. Another interpretation of this probability is that a birth cohort born in v = z — t was incident with disease but died before reaching age z. Using (13) the joint distribution of a(z\t0) and T is

( q(t)dt /J I (t )dT, if t> z P {a(z\to) = l, t <T < t + dt} = I (14)

Therefore, the time in Sa for cases prevalent at t0 and having age z is f (t\z)dt = P{a(z\to) = l,J<T < t + dt} . (15)

Some simplifications occur if I(t) = I. Then

If q(t) is negligible in the neighborhood of z, and t < z, then f (t\z) ~ tq(t)/m which is the usual distribution for the sum if the forward and backward recurrence time random variables.

Using the same development, we can calculate f (t\a(z\to) = 0) which refers to the survival of individuals who died before to, but would have been age z at time to- Since 