## Prospective Definitions The Only Way To Interpret What You Measure

It does not require a training in advanced statistics to hold a commonsense and accurate approach to creating clinical hypotheses, translate them into the precise quantities of a measured end-point, and then interpret the results. Whilst the finer points of statistics are presented in Chapter 19, it is commonsense that the only way to interpret what you measure is to define this whole process, before the experiment starts.

Thinking carefully about what might actually constitute an observed response before you measure it removes at least one important source of bias. That bias is the clinical trialist him/herself. There has been too little emphasis in recent years on the fundamentals of end-points, their variability and how they are measured. Furthermore, the relationship between what is measured and its clinical relevance is always debatable: the tendency is to measure something that can be measured, rather than something that needs validation as clinically relevant. Good examples include rheumatological studies: counts of inflamed joints before and after therapy may be reported, but do not reveal whether the experimental treatment or the corresponding placebo caused some of the patients to recover the ability to write or others the ability to walk (Cha-put de Saintonge and Vere 1982).

Most clinical trialists experience the urge, especially in early studies, to collect every piece of data that they possibly can, before and after every drug exposure. This urge comes from natural scientific curiosity, as well as a proper ethical concern, because the hazard associated with clinical trials is never zero. It behooves us to maximize the amount of information gained in return for the risk that the patient takes for us, and for medicine in general.

Consequently, large numbers of variables are typically measured before and after drug (or placebo) administration. These variables all exhibit biological variation. Many of these variations have familiar, unimodal, symmetrical distributions, which are supposed to resemble Gaussian

(Normal), Chi-squared, f, binomial, etc., probability density functions. An intrinsic property of biological variables is that when measured 100 times, then, on the average and if Normally distributed, 5% of those measurements will be more than ± 2 standard deviations from the mean (there are corollaries for the other probability density functions). This meets a typical, prospective, 'p < 0.05, and therefore it's significant' mantra. It is also true that if you measure 100 different variables, on two occasions only, before and after administration of the test material, then, on the average, 5% of those variables are going to be significantly different after treatment (this masquerades sometimes in findings among 'selected secondary end-points'). A sound interpretation, of course, is based upon only those end-points that were selected before the experiment began, and comparing these with those for which no such statistical differences were found.