Disease Relevant MicroCHIP

didate genes, one may ultimately identify 100 or so critical genes that are differentially regulated in not only heart failure, but also reflect the specific stage or etiology of heart failure.

drugs based either on novel genes thus identified, or on a knowledge of expression levels of mRNA in different pathological states.

A remarkable growth has occured in the source of the chips and the related components of microarray techniques such as clone-set, arrayer, scanner, and analysis software. Technologies are expected to evolve significantly in the next few years. New microarray methods combined with bioinformatics will continue to provide increasing insight into the molecular basis of biological events.

Caveats of Using the Microarray Technology

On the other hand, microarray technologies have limitations, some of which reflect its relative early stages of development, and others of which relate to basic principles of good scientific investigation.

1. Proper experimental design. Because microarray techniques are costly and complex, much thought must be given to experimental design if results are to be interpreted. Experiments must have advanced planning, rigorous controls, and a hypothesis facilitated process that allows focus and subsequent validation. It is only when traditional hypotheses driven experiments are validated, that one can begin the exploratory component of the dataset.

2. Ultrahigh quality of RNA is required for reproducible results. This is the most fundamental requirement for any microarray experiment, otherwise "noise in, noise out'. Worse still, the artifacts or inappropriate interpretation of the results may lead to blind-ended follow-up experiments that should never have been.

3. Inadequate quality control of the chipsets. Even though chip production is now mostly automated and generally of high quality, there are still many opportunities for errors from improper spotting to the wrong sequence selection. This type of error can become magnified when it involves a large number of the genes from multiple experiments.

4. Inability to detect low abundance gene transcripts. The abundance of a gene transcript does not relate directly to its functional importance. Indeed, small critical regulatory messages may be present in low quantity. Chip manufacturers such as Affymetrix and Incyte have increased their sensitivity to approximately one gene/ 100,000 of genes in TRNA population [12]. This level of sensitivity unfortunately is still poorer than that of classical methods such as Northern blot. Therefore, the microarray still can miss rare gene expression events, which may be important in the overall pathophysiology.

5. Artifacts from different sources during the various processing steps, including sample preparation, RNA extraction, labeling, hybridization, cDNA spotting or chip scanning, pose additional problem. However, obsessive care during the processing and multiple replications of the experiments can reduce these problems to a minimum.

6. A surfeit of data can only be analyzed by dedicated data mining tools or bioin-formatics analysis packages, along with the experts in the field with a biological insight. At the present time, confirmation of microarray results with classical techniques, such as RT-PCR, or real time or other quantitative PCR techniques is still necessary. However, this most likely will no longer be necessary when the techniques mature further and results become more reproducible.

7. Expectation of immediate functional insight. Little functional information is typically derived from the first analysis of the expression arrays. Functional insights can only be gained through further evaluation of the known literature, the context of a gene being activated or silenced, and the partners with which the gene putatively interacts.

Experimental Design

The structure of the microarray data, the appropriate types of analysis, and the quality of the results are influenced by the experimental design. Careful forethought and planning is needed to design a successful experiment. Because mi-croarray systems are so sensitive any small changes in sample-to-sample treatment, RNA extraction, sample handling, probe labeling, and other steps in the processing are likely to affect the results. Every effort must therefore be taken in experimental design, such that variations in the data are due to conditions under investigation and are not due to artifacts.

Whilst it is recognized that microarrays are relatively expensive, it is still important to incorporate a finite number of biological replicates in each experiment to ensure ultimate statistical robustness. This is particularly important in a system that has much inherent biological variation. It is also needed to identify low abundance transcripts and/or small changes in expression levels. The most important points to consider when designing an experiment are:

• what is the question that we are trying to answer using this technique,

• gene expression level of what conditions are compared,

• are there any known expression level for genes in these condition which can be used as a reference marker to confirm the fidelity of our result,

• identify the areas which can be the source of variation and try to eliminate them as much as possible, and finally

• what is the maximum of the replicate that is possible to use to allow us the use of statistical method.

The simplest microarray experimental design is to determine the changes in gene expression patterns across a single factor of interest, e.g. temporal frame shifts, genetic manipulation of a single gene, or effects of drug treatments. However, experiments can now also be designed in a multi-factorial fashion to assess their interactions in one set of microarray experiments. Statistical methods are now available to determine the appropriate number of replicates, or to assist the researcher to design appropriately powered experiments [13].

Tissue Preparation and Preservation

This is arguably the most important step for RNA stabilization. Reliable results depend the integrity of RNA. Immediate RNA stabilization in the heart (or any biological) sample is necessary, because changes in gene-expression pattern occur rapidly due to specific and non-specific RNA degradation. There are several methods to stabilize the RNA. The simplest and most widely used is snap freezing of the tissue in liquid nitrogen within minutes of removal. The sample then must be preserved at a very low temperature in a nuclease free environment to ensure to sterility and freedom from nuclease contamination, specifically RNAase contamination. Alternatively, samples can be submerged in the RNA stabilizing reagent (e.g. RNAlater, QIAGEN) immediately after harvesting and stored up to 4 weeks at 8 °C or archived at -20 °C or -80 °C. This RNA stabilizing reagent is specific for animal tissues and can be used for cell-culture and white blood cells, but not for whole blood, plasma, or serum. The advantage of this method over snap freezing is the convenience in cutting and handling of the tissue, that can be transported and stored at close to ambient temperatures.

RNA Isolation

This is a crucial step for preparing high quality RNA free of RNase contamination. Glassware must be treated with DEPC-H2O (0.1% DEPC in H2O) before use. Both the quality and quantity of RNA yield improves if samples are handled with care to avoid RNAase contaminats. Heart tissue is considered a challenging tissue for the isolation of RNA because it is fibrious and thus difficult to homogenize and process for RNA isolation. Several methods are available to purify RNA (either total or mRNA) from the heart tissue. In addition to several commercially available kits, the major method of RNA isolation uses a Guanidine containing reagent, such as TRIZOL (Gibco). Frozen, or RNAiater stabilized tissue samples are disrupted mechanically in a reagent or under liquid Nitrogen and homogenized. At this point, an additional step is necessary to eliminate the fibrous tissues from the sample before proceeding to phase separation with Chloroform. After phase separation, the TRNA is precipitated using Isopropanol and is washed in 70 ethanol (DEPC H2O). It is estimated that approximately 0.1-1 pg of TRNA is present in a single cell [14]. TRNA may vary with sample condition, viability of cells, functional status, and phenotype of the cells. The concentration of RNA can be determined by measuring the OD at 260 nm (A260) in 10mM Tris.Cl, pH 7.5. TRNA with A260/ A280 ratio of 1.9-2.1 is used for microarray experiments. Integrity and size distribution of TRNA must be checked using denaturing agarose gel electrophoresis. In case of small samples, such as those obtained by laser capture microdissection or biopsies, spectrophotometer reading can be omitted because too low RNA concentration may produce false negative OD values. The best method of RNA characterization in these cases is the RNA Bioanalyzer (Agilent). The system permits rapid screening of RNA samples with each disposable RNA chip to determine the concentration and purity/integrity of 12 RNA samples with a total analysis time of 30 minutes. Purified RNA may be stored at -80 °C in water, with no detectable degradation after one year.

RNA Amplification

Results of the human genome project [15-17] have laid the foundation for the microarray gene expression profiling [18, 19]. However, broader utilization of micro-array methods is limited by the amount of RNA required (typically 10 ^g of TRNA or 2 ^g of poly (A) RNA) [12]. This is especially a problem with limited samples, such as endomyocardial biopsies and laser capture microdissection (LCM) can be obtained. An important frontier in the development of microarray for expression profiling involves reduction of the required amount of RNA. Methods aiming at intensifying the fluorescence signal have resulted in an improvement [20]. Significant increase in detection level can be achieved by amplifying poly(A) RNA or cDNA [21, 22]. There are two primary approaches which can be employed to overcome RNA limitations. One is PCR-based amplification and re producible yield, but the relative abundance of the cDNA products is not well correlated with the starting mRNA level. The second approach avoids PCR and utilizes one or more rounds of the robust linear amplification based on cDNA synthesis and a template-directed in vitro transcription reaction [14]. This is a recommended method for amplifying the low abundance mRNA or even gene expression profiling of a single cell by orders of magnitudes from nanograms (1-50 ng) of TRNA or poly(A) RNA in one or two round(s) of amplification(s). This method combines a reverse transcription step with an oligo (dT) primer that contains a T7 RNA polymerase promoter. The first-strand cDNA is then used for synthesis of second-strand DNA by DNA polymerase, DNA ligase, and RNaseH. The resulting double-strand cDNA functions as a template for in vitro transcription step (one or two rounds) which results in a linear amplification of RNA. Fidelity of this mRNA amplification method was assessed using microarray technology [23].

The combination of powerful microarray technology with precise amplification techniques promises to be especially important for small samples of heart biopsies. However, assessment of the yield of labeled mRNA, representation in amplifying various transcripts (fidelity), linearity of amplification, and finally the sensitivity and reproducibility of the method in individual laboratory is essential. Again, the relative efficiency of in-vitro transcription of specific size of mRNA may later correlate with startup levels of mRNA.

4.10

Probe Labeling

Two principal types of arrays, spotted arrays (robotic deposition of nucleic acids) and in situ synthesis (using photolithography) are used in gene expression monitoring [24, 25]. Labeled material can prepared by the "one" and "two color" system.

The "one color" system is the method that is used for in situ synthesized chips. The RNA can be labeled directly with psoralen-biotin derivative or with a Biotin carrying molecule. The labeled nucleotides are incorporated into cDNA during reverse transcription of poly(A) RNA [24, 26]. Alternatively, cDNA with a T7 promoter at its 5' end can be generated to serve as template for the subsequent step in which the labeled nucleotides are incorporated into cRNA. Commonly used dyes are fluorescent cyanine based Cy3 and Cy5 and nonfluorescent biotin (Amersham).

The second method is the "two color" system of probe labeling which is often used with cDNA chips. Equal amounts of cDNA from two different conditions are labeled with different fluorescent dyes, usually Cy3 and Cy5, mixed and hybridized to a chip [25]. The information on ratio (relative concentration) of mRNA from two samples is obtained. There are direct and indirect methods of incorporating the dyes into cDNA. In direct method, the labeled nucleotide is incorporated into the cDNA, whereas, in indirect method, an amino-allyl modified nucleotide analogue such as amino-allyl-dCTP is incorporated into the cDNA to which the dyes are subsequently coupled chemically. In addition to systematic variations in direct dual color labeling, Cy3 and Cy5 exhibit different quantum yields. Thus an cDNA Microarray with Double Spotting & Quality Control

Fig. 4.3 To ensure high degree of reproducibility, the cDNA microarray here is doubly spotted, and only spots show concordant up or down regulation is included in the final analysis. Fluor-flip is another technique to ensure that the differences observed is due to true expression differences and not due to artifact.

Fig. 4.3 To ensure high degree of reproducibility, the cDNA microarray here is doubly spotted, and only spots show concordant up or down regulation is included in the final analysis. Fluor-flip is another technique to ensure that the differences observed is due to true expression differences and not due to artifact.

additional chip with exchange dyes (or commonly called Fluor-Flip) is required to obtain a reliable data (Fig. 4.3). After hybridization and washing, the array is scanned at two different wavelengths to determine the relative transcript abundance for each condition and data analysis.

4.11

Data Analysis and Bioinformatics

The basic techniques in microarray experiments from cDNA synthesis to hybridization and washing are conventional methods that have been used in the laboratory for years. Data analysis is the most demanding part in the use of this extraordinary tool because we deal with an unprecedented volume of data. For the most challenging part of this technology, the data analysis, an increasing number of software tools are available [27]. Two basic steps in microarray data analysis and resources are:

• data collection (collecting raw data from images, correction for the background and normalization), target (differentially expressed genes) detection and target intensity extraction.

• Analysis and Bioinformatics with multiple image analysis and data visualization (e.g. clustering methods to identify unique pattern of gene expression).

The care in assuring accurate reproducibility of the data is of paramount importance (Fig. 4.4).

Fig. 4.4 The reliability of microarray experiments is dependent on the reproducibility of the data set not only within the same subject, but also between subjects in an experiments exposed to the same conditions. Here we illustrate normalized gene expression changes

between the first and fifth subject of a single experiment of aortic banding in a mouse model, demonstrating high degree of correlation and reproducibility between these two hearts subjected to the same stress.

Data collection: differential gene expression is assessed by scanning the hybridized arrays using either a confocal laser scanner (GSI Scan array) producing 16bit TIFF images, or a photomultiplier tube (PMT) laser scanner (Axon Scanner) capable of interrogating both the Cy3- and Cy5-labeled probes and producing the ratio image of 24-bit composite RGB (Red-Green-Blue) or capable of detecting additional dye up to 4 wavelengths simultaneously. The ratio image typically represents the level of two cDNAs (Control and Test) that is hybridized to the array in a "two color" system. A great advantage to this approach is its capacity to demonstrate a dynamic pattern of gene expression. These images then must be processed or be converted to numerical representations in order to calculate the relative expression levels of each gene and to identify differentially expressed genes. In image processing, first the spots representing the arrayed genes are identified and distinguished from nonspecific contamination (such as dust), or artifacts. The second step in image analysis is background calculation and subtraction to reduce the effect of nonspecific fluorescence. Different data analysis algorithms utilized are employed by various software tools to quantify the images. For all ratio calculations that require background subtraction, the median background value is usually used (in GenePix Pro, Axon).

Because of multivariate nature of the microarray experiments, it is not easy to compare data from different experiments. To improve the comparison across many microarrays, data normalization is required. Different software packages offer various methods for normalization (Commercial software: GinPix, Axon; GeneSpring, Silicon Genetics; Affymetrix microarray suite and Data mining tool, Affyme-trix, Inc; Spot fire, Spotfire Inc and free software: DNA-Chip analyzer, SAM, Stanford; Treeview, Eisen, etc. most of these can be found). Increasing numbers of re searchers prefer scaling to normalization. The difference between scaling and normalization relates to the mehtod used to pick the target intensity. For scaling a number that represents the average signal from a large set of arrays is used. For normalization the target intensity is defined as the average signal on the baseline array and then all experimental arrays are adjusted to that value. In addition to per chip normalization or scaling, there must be a per gene normalization in order to bring the data to a relative scale.

Normalized or scaled data are typically analyzed to identify genes that are differentially expressed. Most published studies have used a cutoff of two fold up- or down-regulation to define differential expression; however this can not be true for all genes, because different genes may have different levels of sensitivity.

Multiple statistical methods can be used first to filter the most statistically significant data and then to perform further analysis, data mining, and bioinfor-matics in order to extract the most reliable information from microarray data. Different software packages offer various statistical methods for data filtering such as: parametric test (assume variance equal) or students t-test/ANOVA and Welch t-test/Welch ANOVA (do not assume variance equal) or nonparametric test or Wil-coxon-Mann-Whitney test. In addition to filtering by standard deviation, p-values, etc., multiple testing corrections can be added to the above methods to increase the accuracy of filtered data.

Sophisticated bioinformatics tools are required to extract accurate information from the avalanche of data and to draw a logical and reliable conclusion from the massive volume of information that is generated from the microarray experiments. The objective is to reduce complexity and extract or mine as much useful and relevant information as possible. For microarray data analysis both data mining and bioinformatics are required. Data mining has been defined as "the extraction of implicit, previously unknown, and potentially useful information from data", whereas bioinformatics is used for sequence-based extraction of specific patterns or motifs with the ability of specific pattern matching. Currently they exist as separate approaches but eventually, data mining and bioinformatics will be indistinguishable. Most data analysis software is equipped with bioinformatics rather than data mining tools. When the size of the data set is reduced to a manageable volume of statistically significant data, it is possible for the scientist to identify emerging patterns.

There are several popular methods to analyse and visualizae gene expression data:

• Hierarchical Clustering is used to visualize a set of samples or genes by organizing them into a phylogenetic tree, often referred to as a dendrogram. One way of analyzing microarray data is to look at the cluster (group) of genes with a similar pattern of expression across many experiments. The co-regulated genes within such groups are often found to have related functions. The distance between two branches of a tree is a measure of the correlation between any two genes in the two branches. This is an exceedingly powerful method and is used most widely. It allows a researcher to find experimental conditions (e.g. various drug treatments, classification of disease states) that have similar effects.

• K-means Clustering divides genes into distinct groups based on their expression patterns. Genes are initially divided into a number (k) of user-defined and equally-sized groups. Centroids are calculated for each group corresponding to the average of the expression profiles. Individual genes are then reassigned to the group in which the centroid is the most similar to the gene. Group centroids are then recalculated, and the process is iterated until the group compositions converge. A wide selection of similarity measures (parametric and non-parametric correlations, Euclidean distance, etc.) is available in different software.

• Self-Organizing Maps (SOMs) are tools for exploring and mapping the variations in expression patterns within an experiment. This method is similar to k-means clustering, but with an additional feature where the resulting groups of genes can be displayed in a rectangular pattern, with distance representing the level of similarity (adjacent groups being more similar than groups further away).

• Principal components analysis (PCA) is standard protection technique that explores the variability in gene expression patterns and finds a small number of themes in expression pattern. These themes can be combined to make all the different gene expression patterns in a data set.

• Multidimentional Scaling (MDS) is a method that represents the measure of similarity between pairs of objects. In the clustering section distance matrix is typically used as a similarity matrix between all pairs of samples in one experiment design. Two dimensional scaling plots are used to examine the similarity amongst all samples.

There is no software so far that can extract all the useful data and prevent possible masking of some clusters by transcriptional "noise". Software has become much more powerful in the past few years, but expert data-miners and bioinfor-maticians are still needed.

4.12

Application: New Classification of Disease

System wide explorations of gene expression patterns provide a unique insight into the internal environment of the cells of a particular organ. This may reflect both genetic predisposition towards the disease, and environmental stresses that lead to the disease phenotype. Heart failure is a classic condition in which diverse stimuli may dissimilarly and/or similarly challenge the ability of the heart to adapt [28]. When adaptation becomes limited, disease phenotypes evolve. Indeed, we have traditionally classified heart failure in terms of clinical etiology, for exam-

Fig. 4.5 The ability for microarray expression data to distinguish the gene expression pattern of normal (N), and ischemic cardiomyo-pathy (ICM) can be seen from this hierarchical clustering and dendrogram display. The

genes up and down regulated include both known genes as well as completely novel genes of currently unknown function. This may provide both diagnostic and prognostic information in the future.

ple, dilated cardiomyopathy, ischemic cardiomyopathy, viral myocarditis, etc. We assume also that the disease progression, the prognosis and response to therapy will be predicated on the disease etiology. However, the ability of microarrays to provide a broad insight into the disease process directly within the tissues provide a unique insight into the intracellular perturbations of the cell organization and function (Fig. 4.5) and an entirely unique new perspective on the heart failure process. Commonalities and differences at the molecular level will identify critical pathways of pathogenesis and/or response to therapy.

This approach have been very successfully applied to the field of cancer biology. In the study of breast cancer, expression microarrays have provided an important insight into the biology of the disease, as well as prognostic markers for favourable vs. poor outcomes. Such methods have also been applied to leukemias, as well as prostate cancer (Fig. 4.6). The advantage in cancer biology is that the tumour is often excised, providing a direct source of tissue to correlate with pathology, as well as opportunity to explore patterns of pathway activation. However,

Fig. 4.6 The pattern of expression microarrays characterizes the biology of the tissues. This is best illustrated in cases of cancer, where the expression patterns between different types of tumours (e.g. lymphoma) not only can differentiate one type of lymphoma from another, but also can be associated with differential prognosis. This will likely become more refined as the database becomes more enriched with samples in time.

Fig. 4.6 The pattern of expression microarrays characterizes the biology of the tissues. This is best illustrated in cases of cancer, where the expression patterns between different types of tumours (e.g. lymphoma) not only can differentiate one type of lymphoma from another, but also can be associated with differential prognosis. This will likely become more refined as the database becomes more enriched with samples in time.

with the availability of myocardial biopsies, similar opportunities exist for the study of heart failure.

4.13

Application: Pathogenesis of Disease

Despite its direct relevance, the direct evaluation of human heart failure samples using microarray technology also has significant limitations. The most obvious is the end stage nature of the samples, which may represent a convergence of phe-notypes that have no relevance to the pathogenetic mechanisms. Furthermore, many of these patients also have co-morbidities and concomitant medications, which all serve to skew the gene expression patterns. Nevertheless, clinical validation in a patient population is always important to ensure relevance and concordance amongst biological models.

To obviate these concerns, and to capture the potential early triggers of the heart failure phenotype, animal models of heart failure have been used to give interesting insights. Taylor et al. examined the viral model of myocarditis, and identified specific groups of genes relevant to the viral, inflammatory and healing phases of the myocarditic process [29]. Aronow et al. showed that no single gene program is common to all the models of heart failure. However, there does appear to be a group of programs, linked specifically to each etiology, that predisposes to heart failure [30]. In addition, we have performed serial microarray analysis of gene families potentially relevant in the setting of heart failure in animal models of heart failure. By using carefully controlled experimental designs, where the animals subjected to the same injury can be synchronously followed and analyzed, and compared to age matched controls, the critical differences in host responses leading to heart failure can be precisely identified (Figs. 4.7, 4.8). In the future, the applications of these concepts to either biopsy or preferentially blood based analysis would be of most important interest in studies of heart failure.

4.14

Application: Early Disease Markers and Prognosis

To identify disease early in its process, critical specific markers will be useful in the diagnoses and in delineating desease etiology. Currently, the best example of an early diagnostic marker is brain natriuretic peptide (BNP), which is elaborated by the ventricular myocardium under stress. The spill-over of this marker into the blood has given a useful marker in early diagnosis of heart failure, and also provides prognostic information. However, it is elevated irrespective of heart failure etiology. Thus we must ask whether there are etiologically specific marker that can be found. We have recently identified 5 potentially useful markers for the early diagnosis of myocarditis leading to dilated cardiomyopathy (unpublished observations). This may indeed represent the beginning for future applications of

Fig. 4.7 The expression array information can be functionally clustered to provide biological insight into disease processes, with the brightness of the colour of each molecule representing the relative levels of expression. For example, comparing heart tissues from models of heart failure with that of normal condition, we see

Was this article helpful?

0 0

Post a comment