## Zor Z0j rj

We also denote by zo(r) = [zoj : rj =0] a potential realization of the missing covariate.

We shall treat the variable R as an extra covariate taking values in the set R = {0,1}9. Denoting the sample space of covariates Zo and by Zo and Zi, respectively, the unobserved model is defined on the probability space (Q' x Q, {G' ® Ft}t<T, Pr), where Q' = RxZo xZ1, G' is the Borel a-field of Q' and ''Pr'' is defined in the condition (2.1) below. The observable model corresponds to the transformation of this space according to the assignment X(r, zo, zi,w) = (r, zo(r), zi,w), where r is the realization of the missing data indicator, (zo(r),zi) is the realization of the observed covariate and w is the sequence of states visited and times of entrances into these states. Since the number of events N. (t) = heE0 Nh(t) observed during the time period [0, t] is random, we specify the MAR assumption by conditioning on the number of events N. (t) and type and time of their occurrence. Condition 2.2.

(i) The conditional distribution of the missing data indicator satisfies

Pr(R = r\V,Zo) = v(V,Zo(r)) , where V = [N.(t), (Ji,Ti)N .o ),Z1] and v is a proper conditional probability measure not dependent on missing covariates.

(ii) The parameters of the conditional distribution v of the missing data indicators are noninformative on the parameter ^ = (0,a,f3).

The MAR condition (i) is a type of conditional independence assumption. It is satisfied for example, if the vectors R and Zo are conditionally independent given V. In the latter case, the probability distribution v depends only on the sequence V. The stronger MCAR condition assumes that the function v depends only on the pair (Zi, Jo), but not on the vector V1 = [N(t), (Jl,Tl)f=oT^ or the observed covariates Zo(R). The MCAR model is satisfied for example, if R and the sequence [Zo, (Tg, Ji)i>o] are conditionally independent given (Zi,Jo).

The difference between these two models can be better understood in the context of prediction. If parameters of the missing data mechanism are noninformative on ^ and not estimated from the data, then the MAR and the MCAR model lead to the same likelihoods and the same estimates of the parameter -0. In the case of the MCAR model, the resulting parameters can be also used to estimate some parameters related to the prediction of the survival status of a new patient based on his/her observed covariates and follow-up history. For example, we can define an analogue of the transition probability matrix by setting

= J Pij(t, s\Zo(R),zo(r),Zi)n(dzo(r)\i,e, s, Zi,Zo(R)) , where n(-\i, Í, s, Zi, Zo(R)) is the conditional density of the missing covariate Zo(R) given the observed covariate (Zi,Zo(R)) and given that the states occupied at times 0 and s are J(0) = I and J(s) = i, respectively. This posterior density is given by n(dz0(r)\i,t, s, Zi,Z0(R)) = = Vu(0, s\Zo(R), zo(r), Z1)ge(Z0(r), z0(r)\Z1, ¿)¡-(dz0(r)) f Vu(0, s\Zo(R),zo(r), Zi)ge(Zo(r),zo(r)\Zi,t)^¥(dzo(r))

where ¡i- = ®i-ri=o¡¿i. Under the MCAR model, the matrix (1) can be estimated using plug-in method.

However, under the weaker assumption of the MAR model, the transition probabilities (1) depend in general on the conditional distribution of the missing data indicator, and hence its parameters must from the data. A "partial" MAR model, assuming

instead of the condition 2.2 (i), is sufficient to ensure ignorability of the missing data mechanism for the modified transition probability matrix (1). In sum, although the MAR model forms an ignorable missing data mechanism for estimation of the parameter this is not the case for estimation of transition probabilities or other parameters related to prediction. At the same time, the transition probabilities derived under the MCAR condition, or the partial MAR model (2), are in general biased.

### 2.2 Example

Here we consider a four state illness model assuming that a healthy person (state 0) can develop two forms of a disease: D1 or D2 (state 1 or state 2) and subsequently die (state 3), or else he/she dies without developing the disease. The matrix of baseline intensities is of the form a(t)

The corresponding diagram of transitions is presented in Figure 2.1.

Figure 2.1. A four state illness model

The matrix of conditional transition probabilities P(s,t; z) = [Pij (s,t; z)]ij=o.,,3 has entries

Poo(s,t; z) = Pr(Ti > t\Ti >s,Z = z) = F(t\s, 0, z) ,

Pii(s,t; z) = Pr(T2 > t\T2 >s>Ti,Ji = i,Z = z) = F (t\s,i,z) , i = 1, 2

P0i(s,t; z) = Pr(Ti < t<T2,Ji = 1\Ti >s,Z = z) =

Pi3(s,t; z) = Pr(T2 < t<,J2 = 3\Ti <s<T2,Ji = i,Z = z) =

Pos(s, t; Z) = Pr(Ti < t,Ji = 3\Ti > s, Z = z) + Pr(T2 < t, J2 = 2\Ti >s,Z = z)

In addition, P33(s,t; z) = 1 and the remaining entries are 0.

In the case of complete covariates, the assumption that observations are censored at a fixed time t entails that the transition probability matrix can be estimated only within the range 0 < s < t < t. Next suppose that the covariate vector is partitioned into two blocks Z = (Zo,Zi) and components of the vector Z0 may be missing. To see that under the MAR condition, the matrix of transition probabilities cannot be in general recovered, let us first consider the term

Pr( J (t) = 0\J (s) = 0, Zo (R), Zi, Jo = 0, R) (3)

We have

Pr(Ti > t\Ti > s, Zo(r) = zo(r), Zi = zuR = r) = Ytt,zo(),zi,r) ,

where

= Pr(R = r,Ti > t\Zo(r) = zo(r),Zi = zi) Y(t z0(r) zh r) = Pr(R = r\Zo(r) = zo(r),Zi = zi) '

Moreover, y(t, zo(r),zi,r) = j'(t, zo(r),zi,r)/j'(0,zo(r),zi,r), where

= E Yj(t,zo(r),zo(r),zi,r)ge(zo(r),zo(r)\zl, 0)^r(dzo(r))

j=0J

Yo(t, z, r) = Pr(R = r\N(r) = 0, Zo(r) = zo(r), Zi = zi)

^ [ (Pr(R = r\N (t) = l,Ti = u, Ji = i,Zo(r) = zo(r),Zi = zi)

+ j Pr(R = r\N. (t ) = l,Ti = u, Ji = 3,Zo(r) = zo(r),Zi = zi)

Y2(t,zo,zi,r) = l(t<T) f (f Pr(R = r\N(T) = 2,Ti = u, Ji = i,

T2 = v, J2 = 3, Zo(r) = zo(r), Zi = zi) x f (v, 3\u,i,z)dv\f (u,i\0, 0,z)du.

It is easy to see now that (3) depends in general on the distribution of the missing data indicator. However, under the partial MAR model (2), this distribution does not depend on the number of events observed in the interval [0,t], or their types and times of the occurrence. In this case, the sum (4) reduces to the product

^Yj(t, zo,zi,r) = Pr(R = r\Zo(r) = zo(r),Zi = zi) x

Li=i

+ ¿/ f(u,i\0,0,z)[l - F(T\u,i,z)]+ i f(u, 3 \ 0,0,z) i=1 t t = Pr(R = r\Zo(r) = zo(r),Zi = zi) x

F(t V t\0,0,z) + l(t < t)[F(t\0,0,z) - F(t\0, 0,z)]

Hence (3) does not depend on the conditional distribution of the missing data indicator. A similar algebra and Bayes theorem can be applied also to other entries of the matrix (1).

2.3 Estimation

The MAR assumption 2.2 implies that the conditional density of the sequence (R, V, Zo(R)) given (Jo, Zi) is proportional to p(R,V,Zo(R); ^)= (5)

IIX(V,Zo(R),zo(r); ^)ge(zo(r),Zo(R)\Zi,Jo)p-(dzo(r)) , J q=i where = ®i:ri=opi,

Hi(V,Zo(R),zo(r); = | fl f (Tt,Jt\Tt-i,Jt-i,Zi,Zo(R),zo(r))

H2(V,Zo(R),zo(r); 4>) = [F(t\Tn,(t), Jn.(t),Zi,Zo(R),zo(r))]iJn eT)

The right-hand side of (5) can also be represented as

p(V, Zo(R), zo(r); ^)ge(z0(r),Z0(R)\Zi,J0)^-(dz0(r)) , where p(V,Zo(R),zo(r); = exp[1(N(r) > 0)H3(V,Zo(R),zo(r); ^)] x exp[-1(JNXr) eT)H4(V,Zo(R),zo(r),V; ^)] , and

J2 PTZjt_ujt - e/TZh [AhT) - MT-i)] , i=i h=(Je-1/)GE0

In Appendix 2, we show that in analogy to the case of completely observable covariates, the integral p(R, V,, Zo(R); ^) can be written in the form of a product, evaluated over consecutive times of entrances into adjacent states of the model. However, as opposed to the case of completely observable covariates in Andersen et al. (1993), the factors share parameters in common. From the point of view of parameter estimation, it is easier to work with integrals (5)-(6).

By Bayes formula, for any measurable function @ of the vector (V, Zo), its conditional expectation given the data is

E^[\$(V,Zo)\V,Zo(R),R]= \$(V,Zo) if Rj = 1 for all j = 1,...,d and

f <P(V, Zo(R), zo(r))p(V, Zo(R), zo(r); ^)ge(zo(r), Zo(r)\Zu Jo)^¥(dzo(r))

Jp(V, Zo(R), zo(r); ^)ge(zo(r), Zo(r)\Zi, Jo)^r(dzo(r))

In the following, we denote this conditional expectation by E^@(V, Zo) for short.

We assume now that (Rk, Vk, Zotk(Rk)), k = l,...,n is an iid sample of the missing data indicators and vectors Vk = [N,k(t), (Tj,k, Jj,k)N=i (t\ Zikk], then the log-likelihood function is given by n

L(^) = J2logp(Rk,Vk,Zo,k(Rk); 4) , k=i plus a term depends only on the conditional distribution of the missing data indicators, but not on the parameter 4. To estimate the unknown parameters, we shall approximate the hazard rates ah using histogram estimates. For this purpose let 0 = ti < T2 < ... < T^(n) = t be a partition of the interval [0,t], and define Ip = [tp-i,tp) for p = 2,.. .,£(n) and I^n) = [Te(n-i),Te(n)]. We assume that the number of partitioning points is either finite and independent of n (l(n) = l), or else it grows to infinity with n, that is £(n) ^to as n ^ to. Set

Ah,k(t)= Yh,k(u)a.h(u)du = V] Yh,k(Ip)ahp , Jo p=i where for any pair of adjacent states, h = (i,j) e Eo, we have

= E l(Jm-i,k = i)max{0,min(Tm,k,Tp) - max(Tm-i,k,Tp-i)} .

Substitution of Ahk into the likelihood function gives then an approximate likelihood Ln(4n),4n = (0,P,a = [ah : h e Eo]). The estimate is obtained by maximizing this function with respect to 4n.

In practice the function Ln(4n) may be too difficult to handle directly, so for purposes of estimation we can use EM algorithm. Define

Q2n(4\4) = J2EvHVk,Zo,k; 0), k=i where h(Vk,Zo,k; a, ¡3) = Y; I ah(t)Nh,k(dt)+Y, 3TZh,kNh,k(t)

heEo o heEo

-Y ep Zh'k Yh,k(u)ah(u)du , heE0 Jo i2(Vk, Zo,k; 0) = logge(Zo;k\Zik, Jo,k) .

Here Yhk is the risk process corresponding to subject k, and Nh,k is the corresponding process counting transitions of type h,h G Eo. The functions ii and i2 represent the complete data log-likelihood functions for the parameters (a, 3) and 0 respectively. If ipq is the estimate of the parameter ^ obtained at the q-th step of the algorithm, then the (q +1) step consists of the E-step in which we calculate the conditional expected

Qn('^\'<Pq ) = Qin(^\'<Pq ) + Q2n(^\4>q) ,

In the M-step we maximize Qn(^\">pq) with respect to ^ = = (0,3,a = [ah : h G Eo]).

Let pi be the dimension of the vector 0, and let po be the dimension of the vector of regression coefficients. Denote by Sn an (pi + po + \Eo\l(n)) x (pi + po + \Eo\l(n)) the diagonal matrix with entries

n where \Eo\ denotes the number of adjacent states in the model, and k(n) = 1, if i(n) = i does not depend on n, and k(n) = i(n), otherwise. The normalized score equation for estimation of the parameter ^ = (0,3, a) is given by SnVQn(^\tpq) = 0. The vector VQn(^\tpm) has components

k=iheEo