## Three approaches for estimating prevalences 31 Transition Rate Method

The Transition Rates Method allows us to estimate not only age-specific prevalence but also age-specific non recovery prevalence.

### Method

Let us assume that life history of the individual can be modelled by a stochastic process with four states (Alive and disease free, Alive with the disease and non recovery, Alive considered as cured, Dead). Let us consider the compartment model of Figure 1 with two life states. Denote the healthy state by H, the disease by I, the cure by C and death by D. Assume that the disease is reversible, i.e. that each person who has cancer can recover from the disease.

A subject, at calendar time s, in the healthy state H, may transit to the disease I with intensity a (x) which depends age x. Alternatively, the individual may die directly from state H with intensity p (x). A subject in the disease I may transit to state C with the intensity X (x, d) which depends on age x and duration of the disease d. A person in state I is at risk of death with intensity v (x, d) which depends on duration of the disease d as well as calendar age x (Figure 1). These intensities allow us to establish the age-specific non recovery prevalence of a chronic disease. In order to obtain the expression of the age-specific prevalence, the probabilities of being in the various states of the process are required. Following [B86] and [K91], these numbers are obtained.

Fig. 1. Four-state stochastic model with state 0 : Alive and disease free, state 1: Alive with the disease and non recovery, state 2 : Alive considered as cured and state 3 : Dead

The probability of being alive with disease (i.e. in state I) at age z is expected by

Pi (z, L) = P (a subject at age z is diseased and non cured, diagnosed in [z, z — L)), = JZ_L exp | — J (h + a)(u) d^ a (y)exp | — J (v + X)(u,u — y) dw|dy.

The justification for equation 6 is as follows.

(i) represents the probability of surviving disease-free up to age y,

(ii) represents the conditional "probability" of disease onset at age y,

(iii) represents is the conditional probability of surviving and not being cured to age z given that the individual is diagnosed with disease at age y.

The probability of being alive without disease (i.e. in state H) at age z is expected by

and the probability of being alive in the cure state at age z is expected by exp { — jy ¡i (u) du} exp { — Jy a (u) du} .

Pc (z) = P (a subject at age z is cured, in state C ),

x exp I — JW (v + X) (u, u — y) du| A (w, w — y) exp { — J^ n (u) du} dydw.

Then, at time t, the probability that an individual is alive at age z is given by

P (subject alive at age z) = Pi (z) + Pc (z) + PH (z). (9)

So, thanks to the definition in Section 2, the non recovery L-year partial prevalence of the disease, n (z, L), can be formulated as nNR (z,L)= (10)

jZ—L exp { — fy (p + a) (u) du} a (y) exp j — J^ (v + A) (u, u — y) du} dy

It should be noted that assuming

.i.e there is no transition from state I to state C, we therefore find the illness-death model [K91]. This model does not admit the possibility of recovery but allows us to estimate age-specific L-year partial prevalence as follows jZ—L exp { — fy (p + a) (u) du} a (y) exp j — f^ v (u,u — y) du} dy

At this point, it is necessary to note that equations 3.1 and 12 give the more general expression prevalences.

In the following, the probability that an individual is alive at age z is assumed to be approximated by S* (z), the overall survival of the population at age z provided by vital statistics.

P (subject alive at age z) = Pi (z) + Pc (z) + PH (z), , >

Model specifications

In order to use equations provided by the section 3.1.1, a number of parameters and quantities must be specified. As regards the model specification, following [GKMS99], a semi-parametric model is used.

### Mortality rates

Age-specific mortality rates for all causes of death, written as p* (x), is used to estimate S* (x). They are provided by vital statistics and assumed to be without error. S* (x) is computed as

S* (x) = exp <|-p* (x - gj-i) - ^p* (gk - gfc_i) j . (14)

The probability of not dying of other causes than cancer to age x is computed exactly as for S* (x); however, instead of using the overall mortality rates p* (x), we set p (x) equal to the mortality rate from all causes of death except the cause which interests us.

### Incidence rates

A finite partition of the age axis is constructed, 0 < gi < ... < gj with gj > yi for all i = 1, 2, ...,n. Thus, we obtain the J intervals (0, gi], (gi, g2],..., (gJ_i,gj]. We thus assume that the hazard is equal to aj for the jth interval, j = 1, 2,..., J, leading to

Vt I

Iis cases are diagnosed in the disease state at the age interval i the year s and N (i, s) is the corresponding number at risk of an incident cancer in the population.

### Transition rates from the disease

For the survival function for the "illness" population, we construct a finite partition of the age incidence axis 0 < gi < ... < gj with gj > yi for all i = 1, 2,..., n and a finite partition of the duration in the disease axis 0 < ri < ... < rK with rK > di for all i = 1, 2, ...,n. J x K intervals (0,gi] x (0,ri], (gi,g2] x (0,rij,..., (gj_i,gj] x (rK-i,rK] are therefore obtained. We thus assume that the hazards are equal to Aj leading to in which Cij cases transit from the disease to cure in the jth year following cancer diagnosis among those who were in age interval i at time of diagnosis and in which Rij and Lij are respectively the corresponding number at risk of transiting to death at the beginning of interval j and the number of those who were lost from follow-up in this interval.

The number of cases that transit from the disease to cure is determined using the definition of recovery described in section 2. The cure proportion is estimated according to the age at diagnosis. Then following [MZ96] a probability of recovering from cancer given currently elapsed survival time is attributed to each individual of the cancer registry database. A binary variable of recovery is built using a Bernoulli framework, a time of recovery is therefore generated for each recovered individual. This technic allows us to simulate the event cure, because it is impossible to diagnose recovery by clinical exams or by the information available in registries.

Likewise v (.,.) is assumed to be piecewise constant in age at diagnosis and in duration of the disease. vj is the hazard of death in year j following diagnosis of cancer for individuals diagnosed at any age in the age interval i.

### Age-specific non recovery prevalence estimates

The estimated age-specific non recovery partial prevalence Anr (z, L) is obtained from equation 3.1 using a (x), A (x, d) and A (x, d) described in section

4.1 and using ¡i (x) and ¡* (x) assumed as known without error.

i=Qi in which z = Qq2+i, z — L = gQ\ and Ai is the prevalence of people who were diagnosed in state I in the interval age [gi, gi+i) and who have not been cured,

■flgi,gi+1) exP { — fo + a) (u) du}a (y) exP { — fyZ (A + A) (u, u — du} dy

Analytical expressions of the integral over [gi,gi+i) are provided in the appendix A.

3.2 A parametric model [CD97]

The model developed by [CD97] allows us to estimate age-specific prevalence and age-specific non recovery prevalence using a parametric model.

### Method

Let ¡* (x) represent the general mortality rates at age x. Let a (x) be the incidence rate at age x. Let v (x,x — y)also be the death rates at age x for people who had a cancer diagnosed at age y. Let Sy (x, x — y) be the relative survival

Sr (x,x — y) = exp | — J v (u,u — y) du — J ¡* (u) du^j . (19)

Let 1 — k (d) be the probability of being cured given that an individual has survived for a time d in the disease. The age-specific prevalence provided by [CD97] is therefore expressed as n (z,L) = a (x) k (d) Sr (x, d) dx (20)

Jz-L

in which k (d) specifies the hypotheses made on disease reversibility. If k (d) = 0, n (z, L) is the partial prevalence of the disease, if k (d) > 0, n (z, L) is the partial non recovery prevalence of the disease.

The equation 20 has to be compared to expressions built by the Transition Rate Method 3.1 and 12. Indeed, assuming that

• the disease is rare i.e. the incidence rate is low a < 1=^ e-K a(y)dy ~ 1, (21)

• the mortality rate of non diseased people ¡i (x) is approximated by the mortality rate of the general population ¡* (x) i.e. ¡i (x) = ¡* (x),

• the probability that an individual is alive at age zis approximated by S* (z)

P (subject alive at age z) = Pi (z) + PC (z) + PH (z), (22)

age-specific L-year partial prevalence (cf equation 3.1) can be reformulated as follows

Jz-l exP {- % ¡¡* (y) ¿y)a (y) exP {~ îy (v + x) (u,u - y) du}dy nNR (z,L) =-S*iJ)-.

Leading to nNR (z, L) = J a (y) exp | -J X (u,u - y) du^j Sr (x - y) dy, (24)

in which exp | - J^ X (u,u - y) du j is the probability of surviving from the event "recovery", then it corresponds to the probability of not being cured given that an individual has survived for a time d in the disease k (d).

Three approaches for estimating prevalence with reversibility 177 Model specifications

In order to use equations provided by the previous section, incidence rate and relative survival must be specified. The incidence rate is modelled by a (x) = axb, (25)

this exponential shape has been validated for a quite general class of cancers.

The relative survival function is parametrized by a mixture model [DCHSV99] as follows

p represents the cure proportion and an exponential model is used for the non cure relative survival S*.

If k (d) = 1, both cured and non cured cases contribute to estimate prevalence z n (z,L)= axb {p +(1 - p)exp(-Xd)} dx. (27)

Jz-L

(i—v)eKP(-xd) r t m , prior to the survival time Tc, dis-

( sx"d)-) for d>Tc ease is present with certainty so that the probability of being prevalent case is one, and after Tc the probability of being prevalent cases depends on the cure proportion and on the non cure survival. The non recovery prevalence could be expressed as follows nNR (z,L)= p axbdx + (1 - pW axb exp(-Xd) dx. (28)

These prevalences can be computed numerically and variance estimates are obtained by the Delta method [DFSW88].

3.3 Counting Method estimates

The Counting method was developed by [GKMS99]. The notations used are the following

• let Xi be the exact age at cancer incidence for the ith case of a cancer registry,

• let Ti be the exact calendar time of cancer incidence for that member,

• let Ui be the exact time of loss from follow-up.

• let S (d, x, t) be the probability that a person who develops cancer at age x and date t will survive beyond duration d after cancer incidence. S> (d, x, t) is an estimates of S (d, x, t) obtained by actuarial methods

The probability that an individual who is alive at calendar time t and is in the age group [z, z +1) and has disease incidence in the age interval [ci, C2) with C2 < z is estimated by

= NZfi EI (z - L < Xi <z,Yi > t,Ui > t,z < Xi + s - Ti <z + 1) + EI (z - L < Xi <z,Yi >Ui,Ui <t,z < Xi + t - Ti < z + 1)

the summations are overall disease cases in the registry and I (•) is an indicator function equaling one when the argument is true and zero otherwise.

The justification for equation 29 is as follows.

(i) the first summation represents cancer cases known to have survived up to age z,

(ii) the second summation represents cancer cases who were lost from follow-up before age z.

This method was implemented by the SEER program [SEERD03], then in order to obtain the estimates of partial prevalence, the SEER*Stat Software is used.

For the estimate of the variance, they used a method based on a Poisson approximation proposed by [CGF02].