Sensitivity is one of the most important attributes of an instrument. The usefulness of a measure is dependent upon its ability to detect clinically relevant differences. In clinical trials, therefore, sensitivity should be sufficient to detect differences of the order of magnitude that might occur between the treatment groups. The level of sensitivity that is adequate depends upon the intended application of the instrument. An instrument should be capable of distinguishing the differences of interest, using realistically-sized study groups. The more sensitive an instrument, the smaller the sample size that is necessary to detect relevant differences.
Usually, but by no means always, sensitive measurements will be reliable. This follows because reliability is usually a prerequisite for sensitivity. An unreliable measurement is one that has large background noise, and this will obscure the detection of any group differences that may be present. The converse need not apply; reliable measurements may lack sensitivity. For example, responses to the four-point single item "Do you have pain? (none, a little, quite a bit, very much)" may be highly reliable in the sense that repeated responses by stable patients are very consistent. However, such a question may be unable to detect small yet clinically important differences in pain levels unless there are large numbers of patients in each treatment group. To take an extreme situation, all patients in both groups could respond "quite a bit", with 100% reliability, and yet the patients in one group might have more pain than the other group. The pain scale would have zero sensitivity but perfect reliability. This example also serves to illustrate that "floor" and "ceiling" effects may be crucial. If most patients have very poor QoL and respond with the maximum, "ceiling" value, or with the minimum, "floor" value, the scale will not be sensitive and will not be capable of discriminating between different treatment groups.
Sensitivity is usually assessed by cross-sectional comparison of groups of patients in which there are expected to be QoL differences. Thus it is in practice closely related to known-groups validity. The main distinction is that with known-groups validity we are concerned with confirming that anticipated differences are present between groups of patients. Sensitivity analyses, on the other hand, aim to show that a reasonable-sized sample will suffice for the detection of differences of the magnitude that may exist between treatments (or other subdivisions of interest) and which are clinically relevant.
If the anticipated effects can be detected by a statistical significance test on the resulting data, this is often taken to be an indication of adequate sensitivity. However, it should be noted that statistical significance of group differences is also influenced by the selection of the patient sample. For example, a validation study might select a group of very ill patients to compare against patients who are disease-free. Then we know that there are undoubtedly group differences in QoL, and a significance test is of little practical interest. If the differences are large enough, a p-value of less than 0.0001 merely indicates that the sample size is also large enough to reject the possibility that the difference is zero. A sensitive instrument should be able to detect small differences, in modest-sized studies.
When the HIV Overview of Problems Evaluation System (HOPES) was evaluated by Schag et al. (1992), the mean scores in various subgroups of patients were presented as "further evidence of the validity of the HOPES". Scores from the HOPES scales were compared against scales from other instruments including PACIS, MOS-HIV and POMS. Twelve out of the fifteen p-values were significant with p < 0.0001.
Rather than /j-values, it would have been more informative to give the RE of the various scales.
On the one hand we want to be confident that the groups in the sensitivity study really do differ, but on the other hand we do not want to select groups that are known to have unusually large differences. For this reason, sensitivity studies that evaluate a new instrument should report a variety of comparisons, covering a range of situations that are typical of the areas of intended future application of the instrument.
It is perhaps easier to interpret the measures of sensitivity (and responsiveness) in terms of relative rather than absolute values. Different scales or instruments can then be compared, to determine which is the most sensitive. The RE provides a suitable comparative measure. Another advantage of the comparative approach is that it largely overcomes the criticism that measures of sensitivity are affected by the choice of patient sample and the actual group differences that are present. Thus the magnitude of the specific difference no longer matters; the most sensitive of the concurrently applied instruments is the one with the largest RE.
Vickrey et al. (1997) compared the SF-36, a generic instrument, against the disease-specific QOLQ for multiple sclerosis (MS). They hypothesised that patients reporting least severe MS symptoms during the past year would report the best QoL
Table 3.15 shows the mean scores, F-ratios and RE for a subset of the scales. For all scales, the ordering of the mean scores tended to follow the hypothesised order. The SF-36 role limitation scale was defined as the reference group because it had the smallest F-ratio, and the RE values are calculated using this as the denominator. The physical function test of the SF-36 had the highest RE, but the SF-36 role limitations scales, pain scale, emotional well-being and energy scales all showed low RE.
The authors concluded: "Disease-targeted measures provide additional information about health-related quality of life beyond what is assessed by the SF-36." They also acknowledged that they did not evaluate responsiveness to change, and that "our findings may not generalise to all studies of MS, particularly longitudinal studies".
Table 3.15 Sensitivity of the SF-36 and the QOLQ in adults with multiple sclerosis: mean scores, ANOVA F-ratios, and RE (Based on Vickrey et al., 1997)
Degree of MS symptom severity
None Mild Moderate Extreme Scale n = 4 n = 69 « = 73 « = 25 F-ratio RE
Was this article helpful?