## Info

Figure 1. Fit of empirical domain family size distributions to the second-order balanced linear BDIM: Homo sapiens.

Figure 1. Fit of empirical domain family size distributions to the second-order balanced linear BDIM: Homo sapiens.

of BDIMs. We explored polynomial, rational, and logistic BDIMs with the aim of selecting the model that is best compatible with the data under a critical constraint: the stationary ergodic distribution of all models should be the same as it is for the original linear BDIM.

It follows from (3) that the following modification of any form of BDIM:

whereg{t), i = \,...N, is a positive function,¿(0) = 1, results in a BDIM with the same ergodic distribution of the family sizes as the original one. We studied the class of modification (7) for the linear second order balanced BDIM with Xi = Mi + a), 8, = Mi + b) for i >0, which produce the stationary distribution />, - tr, where y = 1 + b - a. In particular, modifications of a linear BDIM with g(/) = (i + \)M or g(i) = (i + 1)^(1 - i/{N+ c)) define, respectively, broad classes of rational or logistic BDIMs with the same stationary distribution as the original linear BDIM, but with very different dynamic properties.

All stochastic models of genome evolution face an important "time unit" problem. If models (1), (2) are second order balanced, such that X = 8, then A. is a time-scaling constant and the models have a natural "innate" time scale measured in IIX units (hereinafter internal time units). However, if we wish to measure the time in real time units, such as years, we must estimate the parameter X using available estimations of the duplication rate. For this purpose, we choose the average duplication rate, /yB. An estimate of the average duplication rate was produced by Lynch and Conery26 by counting the number of recent duplicates in three eukaryotic genomes and dividing this number by the estimated rate of silent nucleotide substitutions. They obtained the value rju ~2xl0"8 duplications/gene/year, which we used for our calculations. The estimations of X based on the empirical average duplication rate vary for different nonlinear BDIMs. Indeed, in terms of the model (2), the average duplication rate is, by definition, N-l

Let us introduce coefficient cju = uJX, which connects the internal model parameter X with the empirical value of rju such that

Figure 2. Probabilities of formation of families starting from a singleton, Pl(l,n), versus family size (n) for the linear BDIM .The plot is in double logarithmic scale. The model parameters are for D. melanogaster (blue), C. elegans (purple), H. sapiens (red), A. thaliana (green). A color version of this figure is available online at http://www.Eurekah.com.

Figure 2. Probabilities of formation of families starting from a singleton, Pl(l,n), versus family size (n) for the linear BDIM .The plot is in double logarithmic scale. The model parameters are for D. melanogaster (blue), C. elegans (purple), H. sapiens (red), A. thaliana (green). A color version of this figure is available online at http://www.Eurekah.com.

For all transformations (7) of the linear BDIM, the stationary probabilities pi are the same as for the original linear model, but the birth rates A,, and, accordingly, cju vary. We show that the internal time unit becomes smaller with the increase of the "model degree" which results in some interesting effects discussed below.

## Post a comment