## T Epq epTZhk YhkIp

Estimation in a Markov chain regression model with missing covariates 103 Here for any pair of adjacent states, h = (i,j) G Eo, we have

With n = (9, ft) parameters fixed, the equation VaQn('n\'q) = 0 can be solved for ahp. The solution is given by ahp,q ^ eli -) i(s ^ ^ >o)

Thus setting e(n)

at step (q + 1) of the EM algorithm, we obtain a pseudo- estimate of the hazard rate ah. Set

Âhk,q+l(T,n) = Yh,k(u)âh,q+1(u,n)du = V] âhp,q (n)Yh,k (Ip) ■ ■j0 p=1

The profile likelihood score equation for the regression coefficients is

The score equation for the unknown 9 parameter is n

Assuming that the density go is twice differentiable with respect to 9, the parameters n = (9, ft) can be updated using e.g. Newton-Raphson algorithm, by setting nq+i = nq + Hn(nq ,aq+i(nq ))-[V Qn](nq ,aq+i(nq )\'q ) , where Hn(')- is a generalized inverse of an estimate of the information matrix for parameter n. We give its form in Appendix 1.

In practice the partitioning points must be taken to depend on the sample size n. The rate of convergence of the histogram estimate depends in this case on the recurrence properties of the chain. Using results of Freedman (1982)

for the piecewise constant proportional hazard model, we can show that if the total number of jumps of the process N is bounded by a fixed constant, and the dimension of the covariate space is fixed, then the asymptotically optimal choice of the binwidth corresponds to the choice yfnb{n )■2 ^ 0 ,nb(n) ^ to. With this choice the regression coefficients can be estimated at yjn rate.

In Section 3 we assume that the covariate vector Zo is discrete, so that the conditional expected can be evaluated as sums taken over possible missing covariate values zo(r). The analysis of continuous or mixed discrete-continuous covariates is more difficult since the integrals with respect to the conditional distribution of the missing covariates given the observable variables must evaluated numerically using e.g. MCEM (Wei and Tanner, 1990, Sinha, Tanner and Hall, 1994).

### 2.4 Random censoring

So far we have assumed that data are subject to fixed censoring occurring at the termination of the study at time t. Here we consider a censoring model in which the main finite state Markov chain model of interest has state space E = {1,..., k}, whereas the observed marked point process has state space enlarged by one extra absorbing state "c" representing a withdrawal due to causes unrelated to the study. In particular, suppose that the censored data analogue of the four state illness model of section 2.2 can be represented by means of the following transition diagram. Figure 2.2. The censored four state illness model

Thus a healthy person may be observed either (i) to die without developing either of the two disease types, or (ii) to develop one of the two disease types and subsequently die, or (iii) to develop one of the two disease types and subsequently be censored or (iv) to be censored without developing either of the two disease types.

More generally, we assume that observed marked point process registers events Ji,..., Jm,... at times Ti < T2 < ... < Tm, according to the following assumptions.

Condition 2.3

(i) Given that at time Tm-i = tm-i the process enters a transient state Jm—i = jm—i, the waiting time in state jm-i has survival function

Moreover, the subdensities of the progression to an absorbing or transient state of the model are given by

Pr(Tm < t,Jm = j\Z = z, Wm-i) = / Gm (u , jm — i \Wm-i,z)f (u,jm\tm—i,jm—i ,z)dt .

The probability of moving to the censoring state is given by

>Tm-i where Gm = 1 — Gm. (ii) The parameters of the censoring survival functions Gm are noninformative on ^ = (9, a, ß).

Denote by EC = E0 U {(i,c) : i e T}. As in section 2.1, let Nh(t) and Yh(t), h e EC be given by

Nh(t) = £ Nhm(t) , Yh(t) = E Yhm(t) , m>i m>i where

Nhm(t) = 1 (Tm— i < Tm < t, Jm—i = i,Jm = j) ,

Then N(t) = {Nh(t) : t G [0,t],H = (i,j) G EC} is a multivariate counting process whose compensator relative to the self-exciting filtration is given by

In the case of the MAR model, we assume that the functions Gm do not depend on the covariate Zo. Let

N(T ) = E Nh(r) N(T ) = E Nh(r ) = N(T ) + E Nic (T )

heE0 heEj ieT

Thus N.(t) counts the the total number of events observed in [0 , t], excluding withdrawals due to censoring, while N. (t) counts the total number of withdrawals, including withdrawals due to censoring. Condition 2.4.

(i) The conditional survival function Gm do not depend on the covariate Zo.

(ii) The conditional distribution of the missing data indicator satisfies

Pr(R = r\V,Zo) = v(V,Zo(r)) , where V = [N.(t) ,(Je, Te)2

= o ^ Zl\ and v is a ProPer conditional probability measure not dependent on missing covariates.

(iii) The parameters of the conditional distribution v of the missing data indicators are non-informative on the parameter ^ = (0, a, f3) and on the family {Gm : m > 1}.

With this choice, the conditional density of the sequence (R,V,Zo(R)) given (Jo, Zi) is proportional to (5) or (6), with functions and H4 replaced by

[F(Tn {t)+i A t\Tn {tJn {tZi, Zo(R), zo(r))]1(J*MeT) H4(V'Zo(R)'Zo(r)'V; =

Estimation of the parameters can be carried out much in the same way as in section 2.2 since the likelihood function is similar to the case of fixed censoring.

The assumption that the censoring distributions Gm do not depend on the missing covariates is in general quite restrictive. One method to alleviate this problem is to follow the approach of Fix and Neyman (1951) and Hoem (1969) analysis of Markov chain models, that is include among the possible states also states corresponding to possibly different forms of censoring. The hypothetical model corresponding to removal of "censoring" from the model, amounts then to evaluation of taboo probabilities of passage among states of interest without entering into censoring states.

### 3 A data example

For illustrative purposes we consider now data on 4144 patients treated for malignant melanoma cancer at the John Wayne Cancer Institute (JWCI) in Santa Monica. The data were collected during the time period 1980-1996.

The malignant melanoma neoplasm arises in the skin (state 0) from which it may spread to the regional lymphnodes (state 1) or to distant sites (stage 3) directly or indirectly progress first to nodal metastasis and then to distant metastatic sites (state 2). Distant metastasis is followed by death during a relatively short period of time. The survival characteristics of each of these stages are well known from natural history data in prospective databases (Barth et al. 1995, Morton et al. 1997). Figure 3.1. Censored melanoma progression process.

For purposes of analysis we use 5 covariates representing age, gender, site of primary tumor, its thickness and level of invasion. These covariates have been shown to form important prognostic factors for survival and metastasis in many clinical trials and database analyses. In the present study measurements of depth were missing for approximately 30% of patients whereas Clark's level was missing for 16% of patients

Since the remaining covariates were observed for all patients, their marginal distribution was taken as unspecified, whereas the conditional distribution of tumor depth and level of invasion was adjusted using logistic regression assuming that the cell probabilities satisfy

The vector Zi was chosen to consist of the covariates age, gender and site of the primary tumor. Here g(i, mIZi, 0) are the conditional joint probabilities of Breslow's depth (i — 0/1 if depth is larger/smaller than 1.5 mm) and Clark's level (m — 0/1 for level >/< III).

In the regression analysis we assumed that patients were randomly right censored and that the missing data mechanism satisfies MAR conditions 2.4. The corresponding baseline intensity matrix is given by f-J2j=i,2 aQj(t) aoi(t) aQ2(t) 0 0 \

The intensities of the underlying Markov chain model of interest are of the form

Xoj (t) — Yoj(t)aoj(t)eP°jZ°j for j — 1, 2 Xi2(t) — Yi2 (t)ai2 (t)e^2Zl2 Mi(t) — Y24 (t)a.24 (t)e^Z24 Xs4(t) — Y34 (t)a24 (t)e^4Z34+P

The risk processes are defined by Y0j(t) — I(Ti > t, Jq — 0), Yi3(t) — I(T2 > t > Ti, Ji — 1), Y24(t) — I(T2 > t >Ti,Ji — 2) and YM(t) — I(T3 > t > T2J1 — 1J2 — 3). In the regression analysis, the baseline hazard function for transitions into the death state (4) originating from state 2 and 3 are were taken to be the same, whereas the exponential part of the regression model depends on whether or not distant metastasis was preceded by lymphnode metastasis. The transition rates differ also in the risk processes Y24 and Y34.

The diagram implies that each subject may contribute to the sample N (t) — 0,..., 3 events. For a completely observable covariate vector

Z = (Zi, Zo) we have p(V, Zo; V) = l(N. (t) = 0)F(Ti A t\To, Jo, Z) 2

+ 1(N.(t) = l)£l(Ji = j)F(T2 At\Ti,Ji,Z)f (Ti,Ji\To,Jo,Z)

+ 1(N.(t) = 2)1(Ji = 1,J2 = 3)F(T3 A t\T2, 3,Z) J] f (Tq,Jq\Tq-i,Jq-i,Z)

+ 1(N. (t )=2)l(Ji =2,J2 =4) n f (Tq ,Jq\Tq-1,Jq-1,Z)

+ l(N. (t) = 3)l( Ji = l,J2 = 3, J3 = 4) H f (Tq, Jq\Tq-1, Jq-i,Zi, Z) , q=i where