Why It May Be Impossible to Reconstruct Hereditary Relations between Proteins Based Solely on Their Sequence Similarity

Firstly, the correlation function C(t), which measures the probability of an amino acid not to be affected by mutations in time X, decays exponentially, so that beyond the correlation function relaxation time one can not relate the sequences—original, and the one observed at time X later. Secondly, it did not make a difference if we started our design procedure from one sequence or from two unrelated sequences. These sequences diverged so much from each other in a short design simulation time, that one could not identify which initial sequence we used in the design procedure. Furthermore, our results29 suggested that some degree of homology may occur even between sequences that converged from unrelated root to the same structure, i.e., in clear analogs. The reason for that is that as we showed in reference 29 some positions may feature conserved residues due to physical requirement of stability of a common fold. Physical conservation of certain classes of amino acids at some positions in protein folds may be reflected on the genetic level due to the specifics of genetic code. Such conservation in some cases may be confused with homology due to the origin of sequences in divergent evolution. A rigorous definition of analogs and homologs can therefore come only either from the understanding of the correlation times X between consecutive mutations or by reconstructing the actual structural and/or functional evolutionary pathways. If the time scale is smaller than the typical time scale for the formation of a family of homologs, Xo, then the homology is well-defined: the homologous sequences in this case have high sequence similarity, while the analogous sequences have low sequence similarity. At a longer time scale X » Xo, unless there is a high sequence similarity between sequences, the notion of homology and analogy becomes meaningless.

0 0

Post a comment