## V

Rewiring

### Gene Duplication Models of Network Growth

Figure 5. Schematic representation of network growth through gene duplication. A) shows pure gene duplication where a new node is created by duplicating the connectivity of the parent. This results in an increase in degree of the neighboring nodes. Node i is duplicated to give / '. Nodes y, k, I are neighbors. B) the partial duplication model where node ¿is duplicated to <" but not all the original connections are retained. C) shows a rewiring process where edge/'—»;' is rewired to become j—*k. Reprinted with permission from Bhan A, Galas DJ, Dewey TG Bioinformatics 2002;18:1486-1493.

some interesting properties but it does not support a scale-free distribution of connectivities. We have, therefore, examined a number of "mixed" models that include gene duplication plus a second event. Features of two such models are illustrated in Figure 5. The "partial duplication" model (Fig. 5B) consists of duplication plus random removal of edges from the daughter node. A second model, "duplication plus preferential rewiring" (Fig. 5C) involves duplication followed by random rewiring of one of the edges in the network. In our preferential rewiring model, the new node to which the edge is rewired is chosen at random according to the same preference function in the previous GRN models,18,19 i.e., the probability of connecting the edge to a node is proportional to the fraction of edges in the network that are incident at that node. These mixed models have formal similarity to a previous model used to describe the effect of gene duplication on protein-protein interaction networks.34 Recently, a network growth model that yields scale-free networks has been described that involves gene duplication events.35 This is a specific model involving domain shuffling and is distinctly different from the ones presented in this work. In all of these models, gene duplication is followed by a second event that breaks the parent-daughter symmetry inherent in a pure gene duplication model. This results in a broader range of node connectivities.

The results of the computer simulations of network growth are shown in Figure 6 for a variety of growth models and for the two different starting networks (network seeds). As can be

Figure 6. Plot of the distribution of number of nodes with degree k, plotted versus k for simulated networks. (See text for details.) Graphs correspond to three equally spaced time periods during network growth are shown. Leftmost graph is earliest time and rightmost is latest time. Top row are results from simulations with the duplication plus preferential rewiring model. Bottom row is from simulations with the partial duplication model.

Figure 6. Plot of the distribution of number of nodes with degree k, plotted versus k for simulated networks. (See text for details.) Graphs correspond to three equally spaced time periods during network growth are shown. Leftmost graph is earliest time and rightmost is latest time. Top row are results from simulations with the duplication plus preferential rewiring model. Bottom row is from simulations with the partial duplication model.

seen, these networks reproduced a scaling exponent that is consistent with the experimental data (Fig. 4). Table 1 compares the clustering coefficient and the pathlength of the experimental data with the partial duplication model simulation. For comparison, we also show the results for the GRN model, originally introduced by Barabasi and coworkers.19 As can be seen, the GRN model produces lower cluster coefficients and longer pathlengths than the experimental data. The simulation, on the other hand, faithfully reproduces the experimental data. Thus, it is seen that this biologically motivated, gene duplication model does account for the global network statistics of yeast gene expression networks.

### Transcription Factor Networks

Recendy, new array technologies have made it possible to determine where in the genome various TFs bind.36 Since transcription factor binding to the cis regulatory region of the gene strongly influences the expression level of a given gene, this data provides linkages between the expression of TFs and other yeast genes and allows construction of a network. When 100 yeast TFs (out of an estimated 300) were examined in this fashion, it was found that there are many promoters that bind several factors. For example, there are about 100 promoters that bind four of these factors and about 40 that bind five, and several that bind even more. This statistic reveals the degree of complexity of the gene regulatory network in yeast, and the distribution of multiple binding sites on promoters (Fig. 7), also suggests a kind of hierarchy in the structure of the network. This hierarchy is implied by the distribution shown in Figure 7—that a minority of promoters bind a large number of regulatory factors, while a large number of promoters bind only a few factors.

Table 1. Statistical graph parameters for gene expression networks | ||||

## Post a comment