## Stratification

An efficient study design is one that maximizes the 'signal-to-noise ratio'. Thus, controlling the 'noise', or variability, is an important aspect of a good design. Consider the following example.

A graduate student in public health is conducting a research project on the health-related habits of the students at her university. As part of the project, she measures the resting heartbeat of 20 student-subjects. The results are listed in Table 21.3.

The mean is 56.8 and the SD is 3.57. The student then divides the subjects into two groups: group A consists of subjects who do aerobic exercises regularly, and group B of those who do not. The results are presented in Table 21.4

We notice that the two groups of subjects have different means and different SDs. Both SDs are smaller than obtained before separating the subjects into subgroups, i.e. the two groups are more homogeneous than the original group. When

Student 1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |

Heartbeat 60 |
53 |
56 |
56 |
56 |
57 |
56 |
52 63 |
51 |
59 |
63 |
55 |
58 |
56 |
53 |
64 |
56 |
58 |
55 |

Group A |
Subject |
2 |
8 |
7 |
13 |
20 |
5 |
4 |
18 |
10 |
16 |
Mean |
SD |

Heartbeat |
53 |
52 |
56 |
55 |
55 |
56 |
55 |
56 |
51 |
53 |
54.2 |
1.81 | |

Group B |
Subject |
12 |
15 |
14 |
1 |
11 |
6 |
9 |
3 |
17 |
19 |
Mean |
SD |

Heartbeat |
63 |
56 |
58 |
60 |
59 |
57 |
63 |
56 |
64 |
58 |
59.4 |
2.99 |

one combines the SDs into so-called 'pooled standard deviation', the result is SDpooled = 2.47, which is substantially lower than the SD of the original combined group. The reason for this is that, when we calculated the mean of the combined group, we ignored the fact that the group consisted of two subgroups with different means. Thus, the calculated mean was, in fact, a mean of the two subgroups' means. Indeed, the overall mean, 56.8, equals the average of the means of the two subgroups [(54.2 + 59.4)/2]. The SD, therefore, represented the sum of two sources of variation: the intragroup variability, represented by the two subgroups' standard deviations; and the intergroup variability, represented by the difference between the two subgroup means.

The above example illustrates well the idea behind stratification. The study population is usually quite heterogeneous. If one measures the effect of treatment by calculating the overall mean effect in the population, although this mean represents an estimate of the treatment effect in this population, it might be associated with a large measurement error which could make it difficult to distinguish the signal from the background noise. In other words, the overall mean may be an estimate of the treatment effect but an inefficient one. If one can identify a priori certain subgroups, or strata, in the study population that are more homogeneous with respect to the efficacy variable of interest in the trial, then by estimating the effect within each of these strata, and combining these estimates, one may increase substantially the power of the analysis because the noise masking the effect of interest is reduced. It is well known, for example, that in multicenter trials the measured effect often differs between investigators. This could be a result of the physician's procedures, his/her instruments, the method of evaluating the subject's response, or a myriad of other reasons, especially when the measurement has a great degree of subjectivity. Sometimes the difference is due to the characteristics of subject populations from which the different investigators draw their subjects. Whatever the reason might be, it is often common practice to stratify the subjects by investigators. It is also wise to identify important prognostic variables and design the trial so as to stratify according to them. Examples of some common stratification variables are sex, race, age, disease severity, Karnofsky status score (in cancer studies), disease staging, and so forth. When strata are identified, it is recommended that the randomization process will be done within the strata. This helps to equalize the number of subjects in the various treatment groups within each of the strata and to balance them with respect to the stratification variables. The drawback is that as the number of important prognostic variables increases, the number of strata increases multiplic-ably, thus complicating the trial's logistics. For example, if one wants to stratify by sex and race, when sex has two categories (male and female) and race four (White, Black, Hispanic and other), the number of strata is eight. Adding another variable with three categories, such as disease severity (mild, moderate, severe), will bring the number of strata to 24. If, in addition, 'investigator' is a stratification variable, then this would mean that each of the data centers performing the randomization would have to manage 24 randomization tables for each investigator, one for each stratum, which is utterly impractical. For a study of moderate size of 100500 subjects, a large number of strata may mean that some strata may contain very small number of subjects, which complicates the statistical analysis and its interpretation.

In summary, stratification is a very useful tool for noise reduction, but it has its limitations. Usually, the one stratification variable used in a multicenter trial is the investigator. More than one additional variable can introduce serious logistical and methodological difficulties. If one is not concerned about the investigator's effect, then central randomization procedures can be very useful in situations of complex stratification requirements. Computerized central randomization procedures are now available that make complex stratification schemes possible.

## Post a comment