Comparing Expression Data

Normalization. Before expression data from different cDNA or high-density oligo-nucleotide microarrays can be compared to each other, the data need to be normalized. Normalization attempts to identify the biological information by removing the impact of non-biological influence on the data, and by correcting for systematic bias in expression data. Systematic bias can be caused by differences in labeling efficiencies, scanner malfunction, differences in the initial quantity of mRNA, different concentrations of DNA on the arrays (reporter bias), printing and tip problems and other microarray batch bias, uneven hybridization, as well as experimenter-related issues.

Every normalization procedure is likely to remove or even distort some of the biological information. Therefore, it is a good idea to address the problems leading to systematic bias in order to keep normalization to a minimum. Misaligned lasers can easily be fixed, and reciprocal labeling with swapped color dyes will allow correction for differences in labeling efficiencies in cDNA microarray experiments. Sensible cDNA microarray design can help to distinguish reporter bias caused by different DNA concentrations from "biological" effects caused by the systematic arrangement of reporters on the array. Uneven hybridization can be caused by insufficient amounts of labeled probe that fail to saturate the target spots. However, the experimenters themselves can be one of the largest sources of systematic variability. Considering the many steps necessary to perform a microar-

ray experiment, it doesn't come as a surprise that experiments done by the same experimenter have been shown to cluster more tightly and have less variability than experiments done by several experimenters. Taken together, sensible design of arrays and experiments, systematic error checking, the use of reference samples, replicates, consistent methods, and good quality control can significantly enhance data quality and minimize the need for data normalization.

There are several techniques that are widely used to normalize gene-expression data (reviewed in [17]), such as total intensity normalization, linear regression techniques [18], ratio statistics [19], and LOWESS (LOcally WEighted Scatterplot Smoothing) or LOESS (LOcally wEighted regreSSion) correction [20]. Every normalization strategy relies on a set of assumptions. It is important to understand your data to know whether these assumptions are appropriate to use on your data set or not. In general, all of the strategies assume that the average gene does not change, either looking at the entire data set or at a user-defined subset of genes.

Total intensity normalization relies on the assumptions that equal amounts of labeled RNA sample have been hybridized to the arrays. Furthermore, when used with cDNA microarrays, this technique assumes that an equal number of genes are upregulated and downregulated, so that the overall intensity of all the elements on an array is the same for all RNA samples (control and experimental). Under these assumptions, a normalization factor can be calculated to re-scale the overall intensity value of the arrays. Affymetrix MAS uses a scaling factor to bring all the arrays in an experiment to a preset arbitrary target intensity value.

Linear regression techniques rely on the assumption that a significant fraction of genes are expressed at the same level when RNA populations from closely related samples are compared. Based on this assumption, plotting the intensity values of all genes from one sample against the intensity values of all genes from the other sample should result in genes that are expressed at equal levels clustering along a straight line. Regression techniques are then used to adjust the slope of this line to one. However, it has been shown for both, cDNA and high-density oligonucleotide experiments, that the signal intensities are nonlinear [21]. In these cases, a robust local regression technique such as LOWESS correction is more suitable [22]. Some techniques rely on a sufficient number of non-differentially expressed genes, such as "housekeeping" genes or exogenous control genes that have been spiked into the RNA before labeling. However, if the number of pre-determined "housekeeping" genes is small or their intensities do not cover a range of different intensity levels, this approach is a bad choice to fit normalization curves. Also, many of the so-called housekeeping genes do exhibit a natural variability in their expression level. Spiked in controls can span a broad range of ratio and intensity levels and may be useful to detect systematic bias, but cannot account for differences in the initial amount of RNA.

Comparison analysis. After normalization, the data for each gene are typically reported as an " expression ratio" (or its logarithm) of the normalized value of the expression level for the experimental sample divided by the normalized value of the control sample if cDNA arrays have been used. For oligonucleotide arrays,

Average Difference (Affymetrix GeneChip Analysis Suite) or Signals (Affymetrix MAS 5.0) are used as measure for absolute gene expression levels.

For experiments involving a pair of conditions, the next step is to identify genes that are differentially expressed. Various techniques have been proposed for the

Fig. 1.3 Different views on the reproducibility of microarray measurements. In an Affymetrix microarray experiment involving three types of mice (CSX-/-, +/— and +/+), with three independent replicates for each type, Panel A shows the standard deviation of each gene in each type of mouse (y-axis) plotted against the mean expression level (x-axis) for that gene for that group (there are 13,179 gene measurements and 3 types of mice, or 39,537 points). In this view, there appears to be a higher standard deviation (and thus higher ir-reproducibility) in genes measured at higher expression levels. Panel B shows the same points, with the y-axis now showing standard deviation divided by the mean expression level. For most genes, the normalized standard deviation now appears to be the same across the higher mean gene expression level, except for a few genes at a very low mean expression level with large standard deviation. Panel C shows the same graph, with the y-axis showing the logarithm of the standard deviation divided by the mean expression level. Here, most genes' standard deviation is around one-tenth of their mean expression, except for several genes with low expression levels. Panel D shows all genes from two of the replicate experiments, with each axis representing the expression measurement on each chip. Ideally, this plot should represent a line with slope 1. Instead, one sees a typical "fishtail" diagram, with lower expression levels seemingly having less reproducibility.

selection of differentially expressed genes. Earlier studies have used arbitrary cutoff values such as twofold increase or decrease post-normalization without providing the theoretical background for choosing this level as significant. The inherent problem with this simple technique lies in the fact that the experimental and biological variability is far greater for genes that are expressed at low levels than for genes that are expressed at high levels, and that variability is different across different experiments (Fig. 1.3). Therefore, selecting significant genes based on an arbitrary fold change across the entire range of experimental data tends towards preferentially selecting genes that are expressed at low levels.

New data analysis techniques are continuously being developed, driven in part by the obvious need to move beyond setting arbitrary fold-change cut-off values, and because none of the existing techniques has found widespread acceptance in the community so far. Several approaches apply widely-used parametric statistical tests such as Student's t-Test [23] and ANOVA (ANalysis Of VAriance) [24], or non-parametric tests such as Mann-Whitney U test [25] or Kruskal-Wallis test (www.cardiogenomics.org) for every individual gene. However, due to the costs of microarray experiments, the number of replicates is usually low and thereby can lead to inaccurate estimates of variance.

The true power of microarray experiments does not come from the analysis of single experiments attempting to identify single gene expression changes or signaling pathways, but from analyzing many experiments that survey a variety of time points, phenotypes, or experimental conditions in order to identify global regulatory networks. Successful examples include genome-scale experiments identifying genes in the yeast mitotic cell cycle [1], or tumor classification [2]. In order to identify common patterns of gene expression from multiple hybridizations, more sophisticated clustering tools have to be used.

Was this article helpful?

0 0

Post a comment