Qian H. Li1 and Stephen W. Lagakos2

1 Food and Drug Administration

Center for Drug and Evaluation Research, HFD-705 7500 Standish Place, Metro Park North (MPN) II, Rockville, MD 20855 [email protected]

2 Department of Biostatistics, Harvard School of Public Health 655 Huntington Avenue, Boston MA 02115 [email protected]

Summary. We investigate the properties of several statistical tests for comparing treatment groups with respect to multivariate survival data, based on the marginal analysis approach introduced by Wei, Lin and Weissfeld [WLW89]. We consider two types of directional tests, based on a constrained maximization and on linear combinations of the unconstrained maximizer of the working likelihood function, and the omnibus test arising from the same working likelihood. The directional tests are members of a larger class of tests, from which an asymptotically optimal test can be found. We compare the asymptotic powers of the tests under general contiguous alternatives for a variety of settings, and also consider the choice of the number of survival times to include in the multivariate outcome. We illustrate the results with two simulations and with the results from a clinical trial examining recurring opportunistic infections in persons with HIV.

Key words: Directional tests; Marginal model; Multivariate survival data; Omnibus test; Recurring events

In some comparative clinical trials, each subject is followed for K failure-time events, each of which can be right censored. One example is recurring event data, where the K outcomes represent the times from the start of the trial until the occurrence of K clinical/biological events, such as recurring seizures or recurring opportunistic infections [HRL98]. Another is the repeated assessment, under different experimental conditions, of an infectious disease, as measured, for example, by the inhibitory concentration of drug needed to achieve a particular effect on the amount of virus [RGL90]. In the former example the K survival times for an individual are necessarily ordered in magnitude, but in the latter example they do not need to be.

Given the multivariate nature of these data, it is tempting to employ mul-tivariate methods when comparing two treatment groups in the hope that this could provide a more meaningful assessment of their relative efficacy, or a more powerful statistical test than would be available from a univariate analysis, such as when examining the first survival time. Several semi-parametric approaches have been proposed for multivariate failure time data [PWP81], [AG81], [WLW89], [LW92], [LSC93], [CLN96], [CP95]. These methods each make certain assumptions, and their relative power characteristics are not well understood. In practice, however, these and other multivariate failure time methods do not appear to be used very often, and in most cases more familiar methods, such as the logrank test and Cox's proportional hazards model [COX72], are employed. One reason for this might be concerns about the additional assumptions that need to be made when employing most mul-tivariate failure time methods, and lack of knowledge about the consequences of their violation. Another may be the lack of easily accessible software.

The goal of this paper is to assess the properties of statistical tests for comparing treatment groups based on the most popular of these approaches - the marginal analysis proposed by Wei, Lin, and Weissfeld (WLW). The WLW method derives its appeal from its avoidance of assumptions about the dependencies among an individual's K failure times and its simple computational aspects. However, use of the WLW method requires the choice from among several directional or omnibus tests whose relative performance are not fully understood. Additionally, in settings such as the first example of recurring events, one must also choose the number of outcomes, K, on which to base a test, and very little has been done to provide insight into the tradeoffs that arise. By investigating these issues, we aim to provide the analyst with guidelines on how best to utilize multivariate failure time data with this approach when comparing treatment groups.

The properties of the WLW method have also been examined by Hughes [HUG97], who approximated the power of the directional test and omnibus test proposed by WLW under a proportional hazards alternative to the null hypothesis of no treatment effect. Hughes uses the approximate power formulae to assess when a test based on K = l event is more or less powerful than an omnibus K df test based on K events, with special attention given to the comparison of using K = l versus K = 2 events. We build upon these initial results in several ways. In Section 3 we derive the asymptotic power of the two directional and one omnibus test that have been proposed by WLW and Lin [LIN94] under general alternative to the null hypothesis. In Section 4 we show that one of the directional tests proposed for the case of an equal treatment effects across the K failure times is, in general, inefficient relative to the other, and we derive the optimal directional test for an arbitrary alternative to the null hypothesis. We also provide a simple expression for the loss in power of the omnibus K df test relative to the optimal 1 df directional test. In Section 5 we consider the choice of K. When the treatment effect is homogeneous across the K failure times, we show that the power of the directional test proposed by WLW is increasing with K, and describe the relative efficiency of the omnibus test relative to this directional test. We also conduct simulations to examine the relative performance of the omnibus and directional tests for non-recurrent events with proportional hazards and recurrent events with non-proportional hazards alternative to the null hypothesis. We illustrate the methods in Section 6 using data from a HIV trial. Technical details are deferred to the Appendices. We note that a shorter version of this paper with fewer simulation results appears in [LL04].

2 The WLW Method and Definitions of Test Statistics

In this section, we describe the WLW approach, including the three statistics that have been proposed for comparing treatment groups.

Assume that each subject is followed for the occurrence of K survival times, denoted Ti,T2, • • • , tk. The marginal hazard function associated with Tk is denoted by Ak(t|Z), where Z is a covariate which for simplicity we take to be binary, denoting treatment group. The null hypothesis that treatment group is not associated with any of the K failure times is given by

where t > 0,k = 1, 2,..., K. The WLW method is derived from the assumption that the marginal distributions for the two treatment groups have proportional hazard functions; that is, Ak(t|Z) = Ak(t)exp(3kZ) , for k =1, ••• ,K , where 3i,32, • • • , 3k are unknown parameters. The null hypothesis thus reduces to

When the K survival times are not necessarily ordered, it is straightforward to show that there are proper 2K dimensional joint distributions which admit the K proportional hazards relationships represented above. Yang and Ying [YY01] show the existence of such joint distributions when Ti, T2, • • • , tk are ordered, as in the example of recurring events.

WLW allow noninformative right censoring of each Tk by introducing i.i.d. potential censoring times Ci,C2, • • • ,ck, which are assumed to be independent from the Tk. That is, the observation for a subject consists of (Xk,Ak), k = 1, 2,...,K, where Xk = min{Tk ,Ck} is the observed portion of Tk and Ak is an indicator of whether Tk is uncensored (Ak = 1) or right censored (Ak = 0). Suppose that the data consist of n independent copies of (Z, X1,A1, • • • ,XK,AK), the ith of which we denote by (Zi,X1i,A1i, • • • ,XKi,AKi). Then if Lk(3) denotes Cox's partial likelihood function based on the data (Zi, Xki, Aki) for i = 1, • • • ,n, WLW propose that the vector /3 = (3i,32r • • ,3k)' be estimated by maximizing the working likelihood function

Was this article helpful?

## Post a comment