The simplistic divergent evolution model8 that explains the nonrandom behavior of the PDUG is based solely on the premise that a protein has an ancestor that is its closest structural homologue. This model fits the data observed on the PDUG. The model characterizes the "oldest" proteins as those having the largest number of descendants and consequently the number of descendants for each protein depends on the protein's evolutionary age. We can therefore argue from our divergent evolution model that the older clusters and proteins are more populated and have more connections in PDUG. Of course, there is a significant stochastic component evolution of proteins that may drastically affect both family populations and their connectivity.
To detect mutual evolution between structure and function, in reference 135 we independently annotated proteins based on their function. By considering the function of all the proteins that are annotated and disregarding sequence homologies, we found that proteins have, in general, diverse functional descriptors. These descriptors are unique such as Methionine synthase, bl2-binding domains or methylmalonyl-coa-mutase. On the other hand, all proteins can be broken up into just six or seven major functional categories such as enzyme, ligand binding, transporter. It seems apparent that the elucidation of a functional relationship between proteins depends on the system of description. Some medium specificity of functional description must be used if we are to quantitatively measure functional relationships between proteins. Since we do not know the coarseness of the needed annotation, we clearly need a hierarchical system.
A hierarchical system of functional annotation was recendy developed by the GO consortium.134 The GO system of annotation is well suited for measuring functional relationships between proteins because it defines a machine language where we can compare protein functions with little ambiguity based on their unique GO identifiers at different levels of specificity of annotation. The GO hierarchical language is organized as a directed acyclic graph. Each node in this graph is an annotation, a functional descriptor that we can assign to a gene or gene product. As the graph is traversed down, more precise functional descriptions populate the nodes. In this graph, the parent-leaf relationship of the nodes has an "all children are a subset of the parent" conjecture. For example, all adolases are enzymes as are CoA ligases because there is an edge from enzymes to both categories. In reference 135 we independendy mapped protein function onto the whole of PDUG.
In order to carry out a completely machine based annotation, we used a direct mapping of the genes found in SwissProt Database that coded for the PDB entry of the protein domain in PDUG. We mapped the SwissProt entries to the curated annotation of SwissProt by the Gene Ontology Consortium. Each such annotation was mined independendy by the GO consortium primarily from literature searches (http://www.geneontology.org). This yielded a nontrivial mapping from PDB to GO, thus giving each protein its functional assignment. The assignment is nontrivial because some SwissProt entries had many functional annotations corresponding to large, multifunctional, multi-domain proteins, from which our domain was only one. In this case, we kept all functional annotations. Working with domains alleviates the problems of "flow of structure" inside the clusters.136 Flow of structure can happen when proteins A and B share a common domain C. Proteins A and B could then have highly nonrandom structural similarity, but different functions due to the noncommon domain being active. This way, domains may be erroneously classified as functionally equivalent while this may not actually be the case.
Was this article helpful?