Topological Properties of Protein Networks Single Node Topological Properties

An interesting property of many biological networks that was recendy brought to attention of the scientific community1"3 is an extremely broad distribution of nodes' degrees (often called connectivities in the network literature) defined as the number of immediate neighbors of a given node in the network. While the majority of nodes have just a few edges connecting them to other nodes in the network, there exist some nodes, that we will refer to as "hubs", with an unusually large number of neighbors. The degree of the most connected hub in such a network is typically several orders of magnitude larger than the average degree in the network. Often the number of nodes N(K) with a given degree if can be approximated by a scale-free power law form N(K) = fCr in which case the network is referred to as scale-free.1

In this review we concentrate on large-scale properties of physical interaction and regulatory protein networks. In Figure 1 we show the presendy known4 set of transcriptional regulations in a procaryotic bacterium Escherichia coli. For comparison, Figure 2 shows the presendy known5 transcriptional regulations in a simple single-cell eucaryote, Saccharomyces cerevisiae (baker's yeast).

Both yeast and E. coli regulatory networks are characterized by the above mentioned broad distribution of out-degrees Kout of its protein-nodes defined as the number of directed arrows emanating from individual regulatory proteins. Clearly visible in Figures 1 and 2 are the hub regulatory proteins that control the expression level of an unusually large number other proteins. For example, in the E. coli network one can see an extremely highly connected node in the lower half of Figure 1. It is the CAP protein that senses the glucose level, and in response to it orchestrates a cooperative action of a large battery of other proteins related to its utilization.

By comparing Figures 1 and 2 one gets an impression that the apparent growth in complexity of the transcription regulatory network from procaryotes to eucaryotes is achieved mosdy by the virtue of an increase in the typical number of regulatory inputs of a protein (in-degree) Km.

To quantify this further in Figure 3A we compare distributions of nodes' in-degrees in transcriptional regulatory networks of yeast (diamonds, dashed-line) and E. coli (circles, solid-line). This figure also includes the set of currendy known transcriptional regulations in human (Homo sapiens) as extracted by Ariadne Genomics from abstracts of publications cited in MEDLINE. One can clearly see that the distribution of in-degrees in human is broader than that in yeast, which in its turn is significandy broader than that in the E. coli. Indeed, while in

Figure 1. Presently known4 transcriptional regulations in E. coli. Green and red arrows denote positive and negative regulations correspondingly. Nodes in this network represent operons (groups of genes transcribed onto a single mRNA) and arrows (edges) — direct transcriptional regulation of a downstream operon by a transcription factor encoded in the upstream operon. This network consists of 606 regulations of 424 operons by transcription factors contained in 116 different operons. A color version of this figure is available online at http://www.Eurekah.com.

Figure 1. Presently known4 transcriptional regulations in E. coli. Green and red arrows denote positive and negative regulations correspondingly. Nodes in this network represent operons (groups of genes transcribed onto a single mRNA) and arrows (edges) — direct transcriptional regulation of a downstream operon by a transcription factor encoded in the upstream operon. This network consists of 606 regulations of 424 operons by transcription factors contained in 116 different operons. A color version of this figure is available online at http://www.Eurekah.com.

the E. coli Km has an exponential distribution ranging only between 0 and 6, in yeast its range is already between 0 and 15 and in human—between 0 and 18 and the tails of the Kj„ distribution in both eucaryots start to significandy deviate from the exponential functional form.

The above observations are in agreement with two recent empirical studies: C.K. Stover et al6 found that the number of transcription factors (7V,r) in procaryotic organisms grows as a square of the number of genes (N): N,r TV2. Very recently E. van Nimwegen7 has extended this result to eucaryotes where he also observed a superlinear scaling Ntr <*= N1'26. The exact equation

Figure 2. Presently known5 transcriptional regulations in baker's yeast S. cerevisiae. This network consists of 1289 regulations of682 proteins by 125 transcription factors. Green and red arrows denote positive and negative regulations correspondingly. Vertices corresponding to transcription factors are filled while those of remaining proteins are left empty. Apart from the absence of clear signs of modularity (the network has a unique giant connected component or module and only a few small small disconnected modules), one notices several striking features related to hub proteins that each regulate many other proteins: (1) They tend to regulate genes with just a few regulatory inputs. As a result of this they are well separated form each other, and positioned on a periphery of the network. This will be later quantified in the correlation profile of this network (Figs. 7, 9). (2) It is much more frequent for a protein to regulate many other proteins, than to be regulated by many.

Figure 2. Presently known5 transcriptional regulations in baker's yeast S. cerevisiae. This network consists of 1289 regulations of682 proteins by 125 transcription factors. Green and red arrows denote positive and negative regulations correspondingly. Vertices corresponding to transcription factors are filled while those of remaining proteins are left empty. Apart from the absence of clear signs of modularity (the network has a unique giant connected component or module and only a few small small disconnected modules), one notices several striking features related to hub proteins that each regulate many other proteins: (1) They tend to regulate genes with just a few regulatory inputs. As a result of this they are well separated form each other, and positioned on a periphery of the network. This will be later quantified in the correlation profile of this network (Figs. 7, 9). (2) It is much more frequent for a protein to regulate many other proteins, than to be regulated by many.

relates the fraction of transcription factors in the genome of an organism to the average in- and out-degrees of its transcription regulatory network. Thus a direct consequence of the growth of the ratio N^l TV with Nis the increase in complexity of regulation of individual genes: {Kin).

Figure 3. A) The histogram N(Kj„) of nodes' in-degrees K,„ in transcription regulatory networks of human (squares, solid line), yeast (diamonds, dot-dashed line), and E. coli (circles, solid line). This histogram in human is noticeably broader than in yeast, which in its term broader than in the E. coli. B) The histogram N{K0U,) of nodes' out-degrees Kout in transcription regulatory network in human (squares, solid line), yeast (diamonds, dot-dashed line), and E. coli (circles, solid line). Overall, these three histograms are rather similar to each other. Straight lines are power law fits with the slope -2 (solid) and -1 (dashed). To improve the statistics all histograms in this panel were logarithmically binned into 3 bins per decade.

Figure 3. A) The histogram N(Kj„) of nodes' in-degrees K,„ in transcription regulatory networks of human (squares, solid line), yeast (diamonds, dot-dashed line), and E. coli (circles, solid line). This histogram in human is noticeably broader than in yeast, which in its term broader than in the E. coli. B) The histogram N{K0U,) of nodes' out-degrees Kout in transcription regulatory network in human (squares, solid line), yeast (diamonds, dot-dashed line), and E. coli (circles, solid line). Overall, these three histograms are rather similar to each other. Straight lines are power law fits with the slope -2 (solid) and -1 (dashed). To improve the statistics all histograms in this panel were logarithmically binned into 3 bins per decade.

The distribution of Kout shown in Figure 3B appears to be about equally broad in E. coli, yeast and human. It ranges between 1 and about 70 regulations in all three networks. The power-law fit N(K0Ul) - K~Jit gives y = 2 in E-coli and human, while in yeast the distribution seems to have an initial slope characterized by y = 1 followed by a sharper decay for Kout > 30. However, due to a limited range and an incomplete and possibly anthropogenically biased nature of the data (databases of research articles) one should not take these fits too seriously: at the very least they all indicate an unusually broad distribution of out-degrees in transcriptional regulatory networks.

Comparison of the Figure 3A and B also shows that in all organisms the in-degree distribution is much more narrow than that of the out-degree. That is a simple consequence of the fact that regulatory proteins (those with a nonzero Kout) constitute just a small fraction of all proteins in the cell.

Apart from transcriptional regulatory networks, metabolic networks,2 and protein-protein physical interaction networks3 are characterized by a very broad distribution in the number of neighbors of their individual nodes. A small part of such physical interaction network in baker's yeast is visualized in Figure 4.

One aspect of a broad distribution of node degrees in protein interaction and regulatory networks, is the possibility of amplification and exponential spread of signals propagating in the network. The upper bound of the one step amplification of some biochemical signal propagating in a directed network is given by

This amplification factor measures the average number of neighbors to which the signal can be potentially broadcasted in one propagation step. The above formula, derived by Newman in reference 9 follows from the observation that a signal enters a given node with a probability proportional to its in-degree Kin, and leaves along any of its Kout outgoing links. For 1 any signal eventually dies out and hence affects only a small fraction of nodes in the network. On the other hand, for A^ > 1 signals propagating in the network might be exponentially amplified, and thus each of them could influence (and possibly interfere with) other signals over the entire network.

The degree AT in undirected networks cannot be decomposed into in- and out- components. Hence the upper bound on amplification of signals is given by the amplification factor j^undir).9

In the above equation we take into account the fact that the signal cannot reach new nodes along the edge by which it came to a given node. Hence the use of K- 1 in the enumerator. The am plification factor A(undir) in scale-free networks with y < 3 is very large and sensitive to the degrees of the highest connected hub-nodes. Here the borderline case At'urui") = 1 also separates two different regimes.

For A{und,r) < 1 the network breaks into many components isolated from each other, while for A<~und"^ > 1 it consists of a unique "giant" component, containing the majority of all nodes, and a few small disconnected components.

The direct calculation of the directed amplification ratio A^'r> in the transcription regulatory network gives A^f'r> = 1.08 in the E. coli and A^fjJ = 0.58. Hence as directed networks they are both below or approximately at (in E. colt) the critical point Ac = 1. Therefore, signals propagating in these networks cannot exponentially amplify, which limits the extent of cross-talk between them. However if both these regulatory networks are treated as undirected (i.e., one temporarily forgets about the arrows on their edges) one gets significantly overcritical amplification ratios A(undir) » 1: A(™dir) = 10.5 in the E. coli and A^ir) = 13.4 in yeast. This explains why the majority of nodes in Figures 1 and 2 belong to the largest connected component, and why the size of disconnected components is so small. Apparendy the cross-talk presents much bigger potential problem in the network of physical interactions between yeast proteins (Fig. 4), where Appf,r^ = 26.3. In the last chapter of this review we would return to the question of cross-talk and demonstrate how higher-level topological properties detected in both physical and regulatory networks in yeast10 help to reduce such undesirable interference between signals.

Local Rewiring Algorithm: Constructing a Randomized Null-Model Network

The set of degrees of individual nodes is an example of a low-level (single-node) topological property of a network. While it answers the question about how many neighbors a given node has, it gives no information about the identity of those neighbors. It is clear that most functional properties of networks are defined at a higher topological level in the exact pattern of connections of nodes to each other. However, such multi-node connectivity patterns are rather difficult to quantify and compare between networks.

In this chapter we concentrate on multi-node topological properties of protein networks. These networks (as any other biological networks) lack the top-down design. Instead, selective

0 0

Post a comment