Refinement of the methodologies of protein structure determination yielded a massive amount of important information about protein structure. Due to the fundamental developments in the field of molecular evolution, this information unveiled a peculiar picture of the protein structural, sequence, and functional spaces. In particular, graph-theoretical approaches enable us to decipher specific characteristics of these spaces.

It has been suggested25 that protein thermodynamics is one of the important evolutionary driving forces that shape the protein sequence space and govern the architecture of the protein structural space. This force relates protein sequence and structural spaces.

One striking observation is the scale-free organization of the PDUG8—protein structural space—which is signified by hierarchical relations between structurally similar proteins. The emergence of power-law scaling of the PDUG connectivity p (k) is the result of evolutionary dynamics that is as robust at the scale of specific proteomes or at the scale of all organisms. The correlation between structural organization of proteomes and appearance of new organisms (speciation)124 also suggest a truly universal "scale-free" evolutionary dynamics, whereby the appearance of new protein fold families is parallel to appearance of new species.

Distributions of function and structure over the PDUG act as two evolutionary lenses. It is evident that the evolution of structure and function is mutual and governed by the same underlying principles.135 Since according to divergent evolution, aside from the biochemical consideration of function structure correlation, there is also biological pressure for proteins to retain close functional as well as structural similarity to their ancestors upon mutation and duplication. This implies a possibility to trace protein lineages via structural comparisons and further identify a possible function of putative proteins.


1. Levitt M, Chothia C. Structural patterns in globular proteins. Nature 1976; 261:552-558.

2. Koonin EV, Wolf YI, Karev GP. The structure of the protein universe and genome evolution. Nature 2002; 420:218-223.

3. Orengo CA, Jones DT, Thornton JM. Protein superfamilies and domain superfolds. Nature 1994; 372:631-634.

4. Murzin AG, Brenner SE, Hubbard T et al. Scop - A structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995; 247:536-540.

5. Feller W. An introduction to probability theory and its applications. 1968.

6. Yanai I, Camacho CJ, Delisi C. Prediction of gene family distributions in microbial genomes: Evolution by gene duplication and modification. Phys Rev Lett 2000; 85:2641-2644.

7. Qian J, Luscombe NM, Gerstein M. Protein family and fold occurrence in genomes: Power-law behaviour and evolutionary model. J Mol Biol 2001; 313:673-681.

8. Dokholyan NV, Shakhnovich B, Shakhnovich EI. Expanding protein universe and its origin from the biological Big Bang. Proc Natl Acad Sci USA 2002; 99:14132-14136.

9. Karev G, Wolf Y, Rzhetsky A et al. Birth and death of protein domains: A simple model of evolution explains power law behavior. BMC Evol Biol 2002; 2:18.

10. Ponting CP, Russell RR. The natural history of protein domains. Annu Rev Biophys Biomol Struct 2002; 31:45-71.

11. England JL, Shakhnovich EI. Structural determinant of protein designability. Phys Rev Lett 2003; 90:art-218101.

12. England JL, Shakhnovich BE, Shakhnovich EI. Natural selection of more designable folds: A mechanism for thermophilic adaptation. Proc Natl Acad Sci USA 2003; 100:8727-8731.

13. Finkelstein AV, Gutun AM, Badretdinov AY. Why are the same protein folds used to perform different functions. FEBS Lett 1993; 325:23-28.

14. Govindarajan S, Goldstein RA. Why are some protein structures so common? Proc Natl Acad Sci USA 1996; 93:3341-3345.

15. Li H, Helling R, Tang C et al. Emergence of preferred structures in a simple model of protein folding. Science 1996; 273:666.

16. Rykunov DS, Lobanov MY, Finkelstein AV. Search for the most stable folds of protein chains: III. Improvement in fold recognition by averaging over homologous sequences and 3D structures. Proteins 2000; 40:494-501.

17. Taverna DM, Goldstein RA. The distribution of structures in evolving protein populations. Biopolymers 2000; 53:1-8.

18. Buchler NEG, Goldstein RA. Surveying determinants of protein structure designability across different energy models and amino-acid alphabets: A consensus. J Chem Phys 2000; 112:2533-2547.

19. Tiana G, Shakhnovich B, Dokholyan NV et al. Imprint of evolution on protein structures. Proc Natl Acad Sei USA, 2004; 101:2846-2851.

20. Teichmann SA, Murzin AG, Chothia C. Determination of protein function, evolution and interactions by structural genomics. Curr Opin Struct Biol 2001; 11:354-363.

21. Dodson G, Wlodawer A. Catalytic triads and their relatives. Trends Biochem Sei 1998; 23:347-352.

22. Duman JG, Li N, Verleye D et al. Molecular characterization and sequencing of antifreeze proteins from larvae of the beetle Dendroides canadensis. J Comp Physiol [B] 1998; 168:225-232.

23. Duman JG. Antifreeze and ice nucleator proteins in terrestrial arthropods. Annu Rev Physiol 2001; 63:327-357.

24. Makarova KS, Grishin NV. Thermolysin and mitochondrial processing peptidase: How far structure-functional convergence goes. Protein Sei 1999; 8:2537-2540.

25. Makarova KS, Grishin NV. The Zn-peptidase superfamily: Functional convergence after evolutionary divergence. J Mol Biol 1999; 292:11-17.

26. Chothia C, Hubbard T, Brenner S et al. Protein folds in the all-beta and all-alpha classes. Annu Rev Biophys Biomol Struct 1997; 26:597-627.

27. Wallace AC, Borkakoti N, Thornton JM. TESS: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites. Protein

Sei 1997; 6:2308-2323.

28. Russell RB. Detection of protein three-dimensional side-chain patterns: New examples of convergent evolution. J Mol Biol 1998; 279:1211-1227.

29. Dokholyan NV, Shakhnovich EI. Understanding hierarchical protein evolution from first principles. J Mol Biol 2001; 312:289-307.

30. Abkevich VI, Gutin AM, Shakhnovich EI. Specific nucleus as the transition-state for protein-folding - evidence from the lattice model. Biochemistry 1994; 33:10026-10036.

31. Fersht AR. Nucleation mechanisms in protein folding. Curr Opin Struct Biol 1997; 7:3-9.

32. Dokholyan NV, Buldyrev SV, Stanley HE et al. Identifying the protein folding nucleus using molecular dynamics. J Mol Biol 2000; 296:1183-1188.

33. Murzin AG. How fer divergent evolution goes in proteins. Curr Opin Struct Biol 1998; 8:380-387.

34. Pankov R, Yamada KM. Fibronectin at a glance. J Cell Sei 2002; 115:3861-3863.

35. Lupas AN, Ponting CP, Russell RB. On the evolution of protein folds: Are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J Struct Biol 2001; 134:191-203.

36. Russell RB, Ponting CP. Protein fold irregularities that hinder sequence analysis. Curr Opin Struct Biol 1998; 8:364-371.

37. Russell RB. Domain Insertion. Protein Eng 1994; 7:1407-1410.

38. Muller HJ. Bar Duplication. Science 1936; 83:528-530.

39. Ohno S. Evolution by Gene Duplication. Springer-Verlag: Berlin, 1970.

40. Ohno S, Wolf U, Atkin NB. Evolution from fish to mammals by gene duplication. Hereditas 1968; 59:169-187.

41. Gerstein M, Levitt M. A structural census of the current population of protein sequences. Proc Natl Acad Sei USA 1997; 94:11911-11916.

42. Qian J, Stenger B, Wilson CA et al. PartsList: A web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information. Nucl Acids Res 2001; 29:1750-1764.

43. Orengo CA, Bray JE, Buchan DWA et al. The CATH protein family database: A resource for structural and functional annotation of genomes. Proteomics 2002; 2:11-21.

44. Holm L, Sander C. Protein-structure comparison by alignment of distance matrices. J Mol Biol 1993; 233:123-138.

45. Getz G, Vendruscolo M, Sachs D et al. Automated assignment of SCOP and CATH protein structure classifications from FSSP scores. Proteins 2002; 46:405-415.

46. Sprinzak E, Margalit H. Correlated sequence-signatures as markers of protein-protein interaction. J Mol Biol 2001; 311:681-692.

47. Dietmann S, Fernandez-Fuentes N, Holm L. Automated detection of remote homology. Curr Opin Struct Biol 2002; 12:362-367.

48. Russell RB, Sasieni PD, Sternberg MJE. Supersites within superfolds. Binding site similarity in the absence of homology. J Mol Biol 1998; 282:903-918.

49. Irving JA, Whisstock JC, Lesk AM. Protein structural alignments and functional genomics. Proteins 2001; 42:378-382.

50. Brocchieri L, Karlin S. Conservation among HSP60 sequences in relation to structure, function, and evolution. Protein Sei 2000; 9:476-486.

51. Bradley P, Kim PS, Berger B. TRILOGY: Discovery of sequence-structure patterns across diverse proteins. Proc Natl Acad Sei USA 2002; 99:8500-8505.

52. Andrade MA, Perez-Iratxeta C, Ponting CP. Protein repeats: Structures, functions, and evolution. J Struct Biol 2001; 134:117-131.

53. Jones S, Thornton JM. Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 1997; 272:133-143.

54. Gajiwala KS, Burley SK. HDEA, a periplasmic protein that supports acid resistance in pathogenic enteric bacteria. J Mol Biol 2000; 295:605-612.

55. Hwang KY, Chung JH, Kim SH et al. Structurebased identification of a novel NTPase from Methanococcus jannaschii. Nat Struct Biol 1999; 6:691-696.

56. Stec B, Yang HY, Johnson KA et al. MJ0109 is an enzyme that is both an inositol monophosphatase and the 'missing' archaeal fructose 1,6-bisphosphatase. Nat Struct Biol 2000; 7:1046-1050.

57. Shakhnovich BE, Harvey JM, Comeau S et al. ELISA: Structure-function inferences based on statistically significant and evolutionarily inspired observations. BMC Bioinformatics 2003; 4:34.

58. Lakey JH, Raggett EM. Measuring protein-protein interactions. Curr Opin Struct Biol 1998; 8:119-123.

59. Legrain P, Wojcik J, Gauthier JM. Protein-protein interaction maps: A lead towards cellular functions. Trends Genet 2001; 17:346-352.

60. Valencia A, Pazos F. Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 2002; 12:368-373.

61. Fields S, Song OK. A novel genetic system to detect protein protein interactions. Nature 1989; 340:245-246.

62. Rain JC, Selig L, De Reuse H et al. The protein-protein interaction map of Helicobacter pylori. Nature 2001; 409:211-215.

63. Schuck P. Reliable determination of binding affinity and kinetics using surface plasmon resonance biosensors. Curr Opin Biotechnol 1997; 8;498-502.

64. Doyle ML. Characterization of binding interactions by isothermal titration calorimetry. Curr Opin Biotechnol 1997; 8:31-35.

65. Ahmadian MR, Hoffmann U, Goody RS et al. Individual rate constants for the interaction of Ras proteins with GTPase-activating proteins determined by fluorescence spectroscopy. Biochemistry 1997; 36:4535-4541.

66. Gavin AC, Bosche M, Krause R et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002; 415:141-147.

67. Ho Y, Gruhler A, Heilbut A et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002; 415:180-183.

68. Back JW, de Jong L, Muijsers AO et al. Chemical cross-linking and mass spectrometry for protein structural modeling. J Mol Biol 2003; 331:303-313.

69. Zhu H, Bilgin M, Bangham R et al. Global analysis of protein activities using proteome chips. Science 2001; 293:2101-2105.

70. Tong AHY, Drees B, Nardelli G et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002; 295:321-324.

71. Gaasterland T, Ragan MA. Microbial genescapes: Phyletic and functional patterns of ORF distribution among prokaryotes. Microb Comp Genomics 1998; 3:199-217.

72. Pellegrini M, Marcotte EM, Thompson MJ et al. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sei USA 1999; 96:4285-4288.

73. Bono H, Okazaki Y. Functional transcriptomes: Comparative analysis of biological pathways and processes in eukaryotes to infer genetic networks among transcripts. Curr Opin Struct Biol 2002; 12:355-361.

74. Tamames J, Casari G, Ouzounis C et al. Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol 1997; 44:66-73.

75. Dandekar T, Snel B, Huynen M et al. Conservation of gene order: A fingerprint of proteins that physically interact. Trends Biochem Sei 1998; 23:324-328.

76. Overbeek R, Fonstein M, D'Souza M et al. The use of gene clusters to infer functional coupling. Proc Natl Acad Sei USA 1999; 96:2896-2901.

77. Marcotte EM, Pellegrini M, Thompson MJ et al. A combined algorithm for genome-wide prediction of protein function. Nature 1999; 402:83-86.

78. Marcotte EM, Pellegrini M, Ng HL et al. Detecting protein function and protein-protein interactions from genome sequences. Science 1999; 285:751-753.

79. Enright AJ, Iliopoulos I, Kyrpides NC et al. Protein interaction maps for complete genomes based on gene fusion events. Nature 1999; 402:86-90.

80. Tsoka S, Ouzounis CA. Prediction of protein interactions: Metabolic enzymes are frequently involved in gene fusion. Nat Genet 2000; 26:141-142.

81. Goh CS, Bogan AA, Joachimiak M et al. Coevolution of proteins with their interaction partners. J Mol Biol 2000; 299:283-293.

82. Pazos F, Valencia A. In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 2002; 47:219-227.

83. Chothia C. Proteins - 1000 Families for the Molecular Biologist. Nature 1992; 357:543-544.

84. Finkelstein AV, Badretdinov AY, Gutin AM. Why do protein architectures have boltzmann-like statistics. Proteins 1995; 23:142-150.

85. Finkelstein AV, Gutin A, Badretdinov A. Why are some protein structures so common? FEBS Lett 1993; 325:23-28.

86. Davidson AR, Sauer RT. Folded proteins occur frequendy in libraries of random amino-acid-sequences. Proc Nad Acad Sei USA 1994; 91:2146-2150.

87. Rost B. Protein structures sustain evolutionary drift. Fold Des 1997; 2:S19-S24.

88. Chothia C, Gerstein M. Protein evolution - How far can sequences diverge? Nature 1997; 385:579.

89. Grishin NV. Estimation of evolutionary distances from protein spatial structures. J Mol Evol 1997; 45:359-369.

90. Holm L. Unification of protein families. Curr Opin Struct Biol 1998; 8:372-379.

91. Sander C, Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 1991; 9:56-68.

92. Flaherty KM, Mckay DB, Kabsch W et al. Similarity of the 3-dimensional structures of actin and the atpase fragment of A 70-Kda heat-shock cognate protein. Proc Nad Acad Sei USA 1991; 88:5041-5045.

93. Holmes KC, Sander C, Valencia A. A new ATP-binding fold in actin, hexokinase and Hsc70. Trends Cell Biol 1993; 3:53-59.

94. Orengo CA, Michie AD, Jones S et al. CATH - a hierarchic classification of protein domain structures. Structure 1997; 5:1093-1108.

95. Dodge C, Schneider R, Sander C. The HSSP database of protein structure sequence alignments and family profiles. Nud Acids Res 1998; 26:313-315.

96. Sanchez R, Pieper U, Melo F et al. Protein structure modeling for structural genomics. Nat Struct Biol 2000; 7:986-990.

97. Pearl FMG, Lee D, Bray JE et al. Assigning genomic sequences to CATH. Nud Acids Res 2000; 28:277-282.

98. Holm L, Sander C. An evolutionary treasure: Unification of a broad set of amidohydrolases related to urease. Proteins 1997; 28:72-82.

99. Reeck GR, de Haen C, Teller DC et al. "Homology" in proteins and nudeic acids: A terminology muddle and a way out of it. Cell 1987; 50:667.

100. Fitch WM. Distinguishing homologous from analogous proteins. Syst Zool 1970; 19:99-113.

101. Goldstein RA, Lutheyschulten ZA, Wolynes PG. Optimal protein-folding codes from spin-glass theory. Proc Natl Acad Sei USA 1992; 89:4918-4922.

102. Shakhnovich EI, Gutin AM. Engineering of stable and fast-folding sequences of model proteins. Proc Natl Acad Sei USA 1993; 90:7195-7199.

103. Abkevich VI, Gutin AM, Shakhnovich EL Improved design of stable and fast-folding model proteins. Fold Des 1996; 1:221-230.

104. Shakhnovich EL Theoretical studies of protein-folding thermodynamics and kinetics. Curr Opin Struct Biol 1997; 7:29-40.

105. Bryngelson JD, Wolynes PG. Spin-glasses and the statistical-mechanics of protein folding. Proc Natl Acad Sei USA 1987; 84:7524-7528.

106. Abkevich VI, Gutin AM, Shakhnovich EL Theory of kinetic partitioning in protein folding with possible applications to prions. Proteins 1998; 31:335-344.

107. Shakhnovich EI. Protein design: A perspective from simple tractable models. Fold Des 1998; 3:R45-R58.

108. Altschuh D, Vernet T, Berti P et al. Coordinated amino-acid changes in homologous protein families. Protein Eng 1988; 2:193-199.

109. Thomas DJ, Casari G, Sander C. The prediction of protein contacts from multiple sequence alignments. Protein Eng 1996; 9:941-948.

110. Mirny LA, Shakhnovich EI. Universally conserved positions in protein folds: Reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 1999; 291:177-196.

111. Pazos F, Helmer-Citterich M, Ausiello G et al. Correlated mutations contain information about protein-protein interaction. J Mol Biol 1997; 271:511-523.

112. Axe DD, Foster NW, Fersht AR. Active barnase variants with completely random hydrophobic cores. Proc Natl Acad Sei USA 1996; 93:5590-5594.

113. Rowsell S, Pauptit RA, Tucker AD et al. Crystal structure of carboxypeptidase G(2), a bacterial enzyme with applications in cancer therapy. Structure 1997; 5:337-347.

114. Chevrier B, Schalk C, Dorchymont H et al. Crystal-structure of aeromonas-proteolytica aminopep-tidase - A prototypical member of the cocatalytic zinc enzyme Family. Structure 1994; 2:283-291.

115. Dietmann S, Holm L. Identification of homology in protein structure classification. Nat Struct Biol 2001; 8:953-957.

116. Sedgewick R. Algorithms in C. MA: Addison-Wesley, Reading, 1990.

117. Havlin S, Benavraham D. Diffusion in disordered media. Adv Phys 1987; 36:695-798.

118. Stauffer D, Aharony A. Introduction to percolation theory. Philadelphia, 1994.

119. Bollobas B. Random graphs. London: Academic Press, 1985.

120. Albert R, Barabasi AL. Statistical mechanics of complex networks. Reviews of Modern Physics 2002; 74:47-97.

121. Finkelstein AV, Ptitsyn OB. Why do globular-proteins fit the limited set of folding patterns. Prog Biophys Mol Biol 1987; 50:171-190.

122. Vendruscolo M, Dokholyan NV, Paci E et al. Small-world view of the amino acids that play a key role in protein folding. Phys Rev E Stat Nonlin Soft Matter Phys 2002; 65:061910.

123. Deeds EJ, Dokholyan NV, Shakhnovich EL Protein evolution within a structural space. Biophys J 2003; 85:2962-2972.

124. Deeds EJ, Shakhnovich B, Shakhnovich EL Proteomic traces of speciation. J Mol Biol 2004; 336:695-706.

125. Aravind L, Koonin EV. Gleaning nontrivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol 1999; 287:1023-1040.

126. Jordan IK, Kondrashov F, Rogozin I et al. Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins. Genome Biol 2001; 2:research0053.

127. Baker D, Sali A. Protein structure prediction and structural genomics. Science 2001; 294:93-96.

128. Li H, Tang C, Wingreen NS. Are protein folds atypical? Proc Natl Acad Sei USA 1998; 95:4987-4990.

129. Csete ME, Doyle JC. Reverse engineering of biological complexity. Science 2002; 295:1664-1669.

130. Todd AE, Orengo CA, Thornton JM. Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol 2001; 307:1113-1143.

131. Brenner SA. Natural progression. Nature 2001; 409:459.

132. Ponting CP, Russell RB. Identification of distant homologues of fibroblast growth factors suggests a common ancestor for all beta-trefoil proteins. J Mol Biol 2000; 302:1041-1047.

133. Cooper VS, Schneider D, Blot M et al. Mechanisms causing rapid and parallel losses of ribose catabolism in evolving populations of Escherichia coli B. J Bacteriol 2001; 183:2834-2841.

134. Ashburner M, Ball CA, Blake JA et al. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000; 25:25-29.

135. Shakhnovich BE, Dokholyan NV, Delisi C et al. Functional fingerprints of folds: Evidence for correlated structure-function evolution. J Mol Biol 2003; 326:1-9.

136. Schug J, Diskin S, Mazzarelli J et al. Predicting gene ontology functions from ProDom and CDD protein domains. Genome Res 2002; 12:648-655.

0 0

Post a comment