The upper and lower parts of the table show the phylogenetic distribution of 15 arbitrarily chosen high and low degree proteins from publicly available yeast protein interaction data.38 Gapped BLAST was used tosearchforhomologs to these yeast proteins in the GenBankdatabase( Columns in the table correspond to the following broad taxonomic groups. Metazoa (M), Protists (Pr), Plants (P), Fungi (F, exclusive of the genus Saccharomyces), Eubacteria (E) and Archaea (Ar). A '+' indicates that the respective protein has at least one putative homologue within the respective taxonomic group with a BLAST amino acid alignment score of E < 10"10. '++' and '+++' indicate at least one homologue with E < 10"20 and E < 10"30, respectively.

the probability that the protein loses an interaction by q, (q, = ProbfZ), = i-l\D,.j = i)). Finally, let r,- denote the probability that D, does not change between t-1 and t. This simple framework can capture a variety of observations. For instance, in an earlier contribution I suggested that the rate at which interactions get added and eliminated from the network must be approximately balanced, because of the high observed rate of interaction turnover.34 This translates into pi ~ qi for all /. In addition, the observation that proteins with more interaction partners show a greater turnover of interactions (Fig. 3) can be captured as a dependency ofp, on i, e.g., pi = ixc, where c is some constant.

A quantity of interest in this stochastic process is the expected waiting time until a protein first returns to the state D, = i, i.e., m, = E(T^D0 = i), where E indicates the expected value of the random variable 7): = min{i>0: D, = /}, which measures the time until the protein first visits state i. For i = 0, this expected time m, is closely related to the residence time of a protein in the network, that is, the time during which a protein has a degree greater than zero. Quantities like w, are difficult to calculate because we do not know how p„ qi and r,- depend on i, especially for large i. However, it is noteworthy that if the above assumptions held for arbitrarily large i, then this stochastic process would belong in the class of null-recurrent Markov processes,41 whose expected waiting time to return to any state (not only i = 0) is infinite, and can thus not be calculated. We can, however, calculate related quantities that may explain why highly connected proteins are not necessarily phylogeneti-cally old. Consider a protein with degree 1. What is the expected time until such a protein loses this interaction—and thus ceases to be part of the network—assuming that this protein never attains a degree higher than one? If we denote as T the random variable measuring this time, then its distribution is given by Prob(x = k) = qirf'1, which is essentially a geometric distribution. Its mean and variance are given by E(x) = qil(l-r,f, and Var(x) = r^Kl-nf. Order-of-magnitude estimates for upper bounds on the probabilities pi and qi suggest that they are of the order of 6 x 10"4 per protein and million year.34 Using these values, E(t) calculates as 416 million years, and its standard deviation as 588 million years. In other words, even a protein of low degree that does not acquire any further interactions through mutations takes more than an expected 400 million years to lose its only interaction, with an enormous standard deviation. For proteins that acquire more interactions in the course of evolution, this expected time would be much larger. Considering the standard deviation in and by itself, it is then hardly surprising that we can not distinguish proteins of different degrees by their phylogenetic distribution. The time for which even low degree proteins reside in the network can vary over an enormous range, a range greater than the time elapsed since the Cambrian radiation. A statistical test could not distinguish between the age of high and low-connectivity proteins if their residence time in a network can vary so widely.


In sum, I have reviewed evidence pertaining to the hypothesis that natural selection acts on the global structure of cellular networks and is responsible for their broad-tailed degree distribution. While associations between gene knock-out effects and protein degree weakly support this hypothesis for protein interaction networks, evolutionary studies and explanations of network structure based on purely local processes argue against it. I showed that the great dispersion of time for which proteins may reside in a network can obscure expected differences in the taxonomic distribution of highly and lowly connected proteins. Similar to metabolic reaction networks, where chemistry itself is an important factor shaping a network's structure, the minor role for natural selection in optimizing a network's degree distribution suggests an important role for protein chemistry in determining this distribution. Which of a proteins chemical features, such as domain composition or surface properties, renders some proteins highly connected? What aspect of protein chemistry is responsible for the observation that highly connected proteins show a greater evolutionary turnover of interactions? The answers to these and other questions are contained in accumulating structural data on thousands of proteins.


I would like to thank the Santa Fe Institute for its continued support, as well as the NIH for its support through grant GM063882.


1. Rzhetsky A, Gomez SM. Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics 2001; 17(10):988-996.

2. Wuchty S. Scale-free behavior in protein domain networks. Mol Biol Evol 2001; 18(9):1694-1702.

3. Wuchty S. Interaction and domain networks of yeast. Proteomics 2002; 2(12):1715-1723.

4. Koonin E, Wolf Y, Karev G. The structure of the protein universe and genome evolution. Nature 2002; 420(6912):218-223.

5. Branden C, Tooze J. Introduction to protein structure. New York: Garland, 1999.

6. Nagano N, Orengo C, Thornton J. One fold with many functions: The evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J Mol Biol 2002;

7. Li W-H. Molecular Evolution Massachusetts: Sinauer, 1997.

8. Bornberg-Bauer E. How are model protein structures distributed in sequence space? Biophys J

9. Barabasi A-L, Albert R, Jeong H. Mean-field theory for scale-free random networks. Physica A 1999; 272(1-2): 173-187.

10. Albert R, Barabasi A-L. Statistical mechanics of complex networks. Reviews of Modern Physics 2002; 47(l):47-94

11. Albert R, Jeong H, Barabasi AL. Error and attack tolerance of complex networks. Nature 2000; 406(6794):378-382.

12. Jeong H, Tombor B, Albert R et al. The large-scale organization of metabolic networks. Nature 2000; 407:651-654.

13. Jeong H, Mason SP, Barabasi A-L et al. Lethality and centrality in protein networks. Nature 2001; 411:41-42.

14. Wagner A, Fell D. The small world inside large metabolic networks. Proc Roy Soc London Ser B 2001; 280:1803-1810.

15. Fell D, Wagner A. The small world of metabolism. Nat Biotechnol 2000; 18:1121-1122.

16. Cascante M, Melendez—Hevia E, Kholodenko BN et al. Control analysis of transit—time for free and enzyme—bound metabolites - physiological and evolutionary significance of metabolic response— times. Biochem J 1995; 308:895-899.

17. Easterby JS. The effect of feedback on pathway transient response. Biochem J 1986; 233:871-875.

18. Schuster S, Heinrich R. Time hierarchy in enzymatic-reaction chains resulting from optimality principles. J Theor Biol 1987; 129(2): 189-209.

19. Gleiss PM, Stadler PF, Wagner A et al. Small cycles in small worlds. Advances in Complex Systems 2001; 4:207-226.

20. Benner SA, Ellington AD, Tauer A. Modern metabolism as a palimpsest of the RNA world. Proc Natl Acad Sei USA 1989; 86:7054-7058.

21. Wachtershauser G. Before enzymes and templates: Theory of surface metabolism. Microbiol Rev 1988; 52:452-484.

22. Kuhn H, Waser J. On the origin of the genetic code. FEBS letters 1994; 352:259-264.

23. Morowitz HJ. Beginnings of Cellular Life. New Haven: Yale University Press, 1992.

24. Taylor BL, Coates D. The code within the codons. Biosystems 1989; 22:177-187.

25. Waddell TG, Bruce GK. A new theory on the origin and evolution of the citric acid cycle. Microbiologia Sem 1995; 11:243-250.

26. Lahav N. Biogenesis. New York: Oxford University Press, 1999.

27. Giaever G, Chu AM, Ni L et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature 2002; 418(6896) :387-391.

28. Steinmetz L, Scharfe C, Deutschbauer A et al. Systematic screen for human disease genes in yeast. Nat Genet 2002; 31(4):400-404.

29. Winzeler EA, Shoemaker DD, AstromofF A et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 1999; 285(#5429):901-906.

30. Hahn M, Conant GC, Wagner A. Molecular evolution in large genetic networks: Does connectivity equal constraint? J Mol Evol. 2004; 58(2):203-ll.

31. Fraser HB, Wall DP, Hirsh AE. A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol 2003; 3:11.

32. Jordan IK, Wolf YI, Koonin EV. No simple dependence between protein evolution rate and the number of protein-protein interactions: Only the most prolific interactors tend to evolve slowly. BMC Evol Biol 2003; 3:1.

33. Jordan IK, Wolf YI, Koonin EV. Correction: No simple dependence between protein evolution rate and the number of protein-protein interactions: Only the most prolific interactors evolve slowly. BMC Evol Biol 2003; 3:5.

34. Wagner A. How large protein interaction networks evolve. Proc R Soc Lond B Biol Sci 2003; 270:457-466.

35. Sole RV, Pastor-Satorras R, Smith ED et al. A model of large-scale proteome evolution. Advances in Complex Systems 2002; 5:43-54

36. Albert R, Barabasi AL. Statistical mechanics of complex networks. Reviews of Modern Physics 2002; 74(l):47-97.

37. Altschul SF, Madden TL, SchafFer AA et al. Gapped blast and psi-blast : A new generation of protein database search programs. Nucleic Acids Res 1997; 25(17):3389-3402.

38. Uetz P, Giot L, Cagney G et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000; 403(6770):623-627.

39. Mewes HW, Heumann K, Kaps A et al. MIPS: A database for genomes and protein sequences. Nucleic Acids Res 1999; 27:44-48.

40. Karlin S. A first course in stochastic processes. New York: Academic Press, 1975.

41. Kulkarni VG. Modeling and analysis of stochastic systems. New York: Chapman & Hall, 1995.

42. Ito T, Chiba T, Ozawa R et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 2001; 98(8):4569-4574.

0 0

Post a comment