## Data reporting and statistics

The experiment is complete, the data have been analyzed, now what should be done?

There are a huge number of variables associated with real-time PCR assays. Variation is introduced by harvesting procedures, nucleic acid extraction techniques, reverse transcription, PCR conditions, reagents, etc. For this reason, data reporting and statistics are important steps in preparing your results for peer review. Data presentation is dependent upon the type of experiment. For example, experimental protocols might examine relative gene expression before and after treatment, normal versus tumor, time courses, responses to inflammation or disease. Other experiments might measure the quantity or strain of organisms in food, water, or the environment. Validations of microarray and siRNA results are other common uses for real-time PCR technology. Regardless of the type of experiment, data should be presented in a manner which allows the reader to observe the amount of variation inherent to the experiment; for example, mean, standard deviation and confidence intervals. Some sort of statistical analysis should be performed to apprise the reader of probabilities of differences being significant. Most real-time PCR experiments are based on hypothesis testing. What is the probability that randomly selected samples have a difference larger than those observed? In some situations, the differences in data are obvious and statistics are a formality. But because biological systems are subject to variation and experimental imprecision, sometimes statistics can reveal differences that are not otherwise discernible, especially if there are a large amount of data. Ideally, an experiment should be planned with a statistical analysis in mind. Large studies should be planned with the advise of a statistician.

If statistical and bioinformatics resources are limited, then there are several choices. Freely available software packages for this purpose include Q-Gene (Muller et al., 2002), DART-PCR (Peirson et al, 2003) and REST (relative expression software) (or REST-XL) (Pfaffl et al., 2002). These and other software are presented in detail in Chapter 3. For a more generic approach, a wise investment is to purchase some type of software for reporting data and performing statistical analyses that allows both functions to be performed using the same software.

One such package is Graphpad PrismĀ® (Graphpad Software Inc., San Diego, CA, USA). This software provides many data presentation choices, handbooks to explain basic statistics (Statistics Guide: Statistical analyses for Laboratory and Clinical Researchers, Fitting Models to Biological Data using Linar

4035-<5 302520-

Scatter Plot m

Treatment

Box and Whisker Pfot

40 n

Treatment

Bar Graph

Bar Graph

4035-d 30 25 H

Treatment

Examples of data reporting. The data for each panel is the same representing the real-time PCR results for four different treatments. NTC = No template control. A. Scatter plot of individual data points with the mean shown as a bar.

B. Box and whisker plot: box extends from the 25th to 75th percentile with a line at the median. Whiskers extend to show the highest and lowest value.

C. Bar graph of the mean and standard deviation for each group. D. Mean of each group plotted as a single point +/- standard deviation.

and Nonlinear Regression) and automatic performance of a wide variety of statistical analyses. PrismĀ® provides checklists to help the user decide which is the correct statistical analysis and even describes the best way to cite statistical analyses. With a few basic concepts understood, the mathematics can be left to the software. A good general review of statistics and how they might apply to real-time PCR can be found in A-Z of Quantitative PCR edited by Stephen Bustin. Figure 2.12 shows several different methods of presenting the same data. Scatter plots (Figure 2.12A) allow the reader to see the individual data points and the number of samples in each group (n) is readily apparent. A box and whiskers plot (Figure 2.12B) conveys data about the mean, the 25 th and 75th quartile values and the highest and lowest value in the group. Some prefer bar graphs or single means with the standard deviation (Figure 2.12C and 12D). Be sure to use the standard deviation (SD)

Table 2.2 One-way analysis of variance of the data graphically represented |
in Figure 2.11 | ||||

demonstrating the calculation of P values between groups | |||||

P value |
P<0.0001 | ||||

Are means significantly different? | |||||

(P <0.05) |
Yes | ||||

Number of groups |
5 | ||||

F |
46.4 | ||||

R squared |
0.9391 | ||||

ANOVA table |
SS |
df |
MS | ||

Treatment (between columns) |
436.7 |
4 |
109.2 | ||

Residual (within columns) |
8.712 |
10 |
0.8712 | ||

Total |
445.4 |
14 | |||

99% CI of | |||||

Tukey's multiple comparisons test |
MD |
q |
P value |
difference | |

A vs. B |
7.218 |
7.961 |
P <0.001 |
1.93 to |
12.51 |

A vs. C |
-0.6567 |
0.7242 |
P >0.05 |
-5.95 to |
4.64 |

A vs. D |
-3.67 |
3.786 |
P >0.05 |
-9.38 to |
1.99 |

A vs. NTC |
-9.697 |
10 |
P <0.001 |
-15.35 to |
-4.04 |

B vs. C |
-7.875 |
9.381 |
P <0.001 |
-12.77 to |
-2.98 |

B vs. D |
-10.89 |
12.01 |
P <0.001 |
-16.18 to |
-5.60 |

B vs. NTC |
-16.92 |
18.66 |
P <0.001 |
-22.21 to |
-11.62 |

C vs. D |
-3.013 |
3.323 |
P >0.05 |
-8.31 to |
2.28 |

C vs. NTC |
-9.04 |
9.97 |
P <0.001 |
-14.33 to |
-3.75 |

D vs. NTC |
-6.027 |
6.217 |
P <0.01 |
-11.68 to |
-0.37 |

and not the standard error of the mean (SEM) because the standard deviation is a better indicator of how much variability there is in the data. A statistical analysis using a one-factor, analysis of variance (ANOVA) and Tukey's multiple comparison test was performed on the data in graphically represented in Figure 2.12 and the results are shown in Table 2.2.

Statistics are important but must be tempered with scientific and/or clinical experience. Remember that statistical significance and biological relevance may not be directly correlated.

## Post a comment