Comparative Homology Modelling

Of the 517 protein kinases in the human kinome [116], structures are publicly available for < 10% of the different enzymes. In the absence of experimental data, construction of homology models based on known structures has proven a reliable method for generating 3D information. How, then, are the models built and what are the limitations of the models especially as applied to the family of protein kinases?

There are several stages to the construction of the homology model, probably the most crucial being the identification of a suitable template (or templates) and the alignment of the sequences. Template recognition is usually undertaken with programs like FASTA or BLAST, methods readily capable of identifying templates having a sequence identity >25%. Confidence in any model falls rapidly as sequence identity falls below 25% and will at best lead to an approximate reproduction of the tertiary structure. At the opposite end of the scale, errors in models built using templates where the identity >90% can be as low as the errors in the experimental cry-stallographic structure. Within the family of protein kinases, homology across the subfamilies generally lies in the region 25-35% ID. Homology of the residues defining the ATP-binding site is higher, as there are clusters of conserved residues that are involved in the catalytic mechanism of the enzymes. At this point, it is important to assess the template(s) that have been identified considering, among other things, the quality of the data; the activation state of the kinase structure and ligand induced conformational changes. Following the selection of one or more templates, a detailed alignment is performed based on maximising a scorings or exchange matrix and more often than not involving manual correction. A model can now be constructed, first building the backbone, then the loop regions and finally adding the side chains. There are many programs available for building homology models. Most, if not all, also incorporate the means to evaluate and refine the model after it has been constructed. Further details of the methods are presented at the CMBI website (www.cmbi.ru.nl/gvteach/ hommod/index).

There are very few reported studies outlining the use of homology modelling in protein kinases. Panigrahi and Desiraju [117] performed a simple comparison of a model of the EGFR kinase domain to an experimental

X-ray structure, citing a C-a RMSD of 1.96 A. McGovern and Shoichet [15] came to a number of conclusions, stating that the performance of the docking experiment is affected by the particular representation of the receptor. Perhaps, the most telling remark is that the majority of the structures including the models gave some degree of enrichment compared to random screening.

Diller and Li [118] have also assessed the use of homology models in high throughput docking. Models for six protein kinases (PDGFRb, VEGFR1, EGFR, p38, SRC and FGFR-1) were built from a variety of templates using the Modeler package (Accelrys). A database of 32,000 compounds seeded with known inhibitors for each kinase was then docked into each model using the program LibDock (Accelrys) and scored with the Piecewise Linear Potential 2 (PLP2) function. A solvation term was added to the function to correct for any correlation between score and size for the molecules, a consequence of the function being based on interaction counts. Enrichment factors were calculated for each simulation and the models were also assessed for their ability to discriminate between the inhibitors, a key point of concern when building a model based on an alternative enzyme. Crystallo-graphic data are available for four of the enzymes, allowing for a direct comparison of the docking results between experimental and computergenerated structures. The homology model of EGFR was compared to the apo crystal structure of EGFR (1M14) and achieved similar enrichment factors versus the random compounds of 4-5 and 5-6, respectively. The model for p38 performed poorly, enrichments of 1.5-2.6, compared to the crystal structure, enrichments of 7-11, mainly attributed to the fact that the crystal structure used (1A9U) is a complex containing the well-known pyridinyl-imidazole inhibitors. The situation was reversed for Src kinase, where the model achieved higher enrichments than the crystal structure. The model was based on an apo form of the related enzyme lymphocyte-specific kinase Lck, sequence identity of 66%, whereas the crystal structure was solved from protein crystals initially seeded with the ATP analogue AMP-PNP. Structures complexed with ATP or alike adopt a more closed binding site, thereby restricting access to larger ligands. For FGFR-1, neither the model nor the crystal structure performed well, again a consequence of induced fit. A second more open model built using an apo template structure faired better. The results can be summarised as follows; the variation in conformation between a model and X-ray structure can be no more than that observed between the different forms of a single protein kinase. Success is very much dependent on the conformation of the template structure and the end objective of the model. If screening for novel chemotypes, an open apo template provides the best starting point, though this will no doubt lead to an increase in false positives.

Virtual screening formed the basis of comparison for Oshiro et al. [119] looking at enrichment factors of homology models of CDK2 versus X-ray data. MOE (Chemical Computing Group) was used to build four models of CDK2 based on templates ranging in sequence identity (ATP site residues only) from 43 to 60%. A set of 17,000 compounds containing 367 actives from 15 different scaffolds was then docked into the X-ray structure and each of the models. Compounds were docked, scored and ranked using the DOCK program. All of the screening experiments exhibited enrichment compared to random screening, with models constructed from templates greater than 50% identity consistently achieving enrichment factors equivalent to using the crystallographic structure. Fifty per cent identity was noted as a threshold value above which the C-a RMSD of model versus crystal structure was found to be, on average, of the same order as the error in the X-ray data, roughly 2 A, hence the comparable performance in the docking experiments. Enrichment values decreased to a 2-fold low as the sequence identity of the model templates decreased. The study further reinforces the value of homology models where crystallographic data are unavailable, provided due care and attention is paid to the generation of the model, in particular the choice of template.

In an ideal scenario, crystallographic data would form part of the hit verification process, but, as noted in many of the studies discussed in this review, it is rarely the case. Selectivity is also an issue of many of the medicinal chemistry programmes, where homology modelling could play a vital role in providing comparisons between the structures of different enzymes. Homology modelling provides a rapid and cheap alternative to the significant investment required for protein crystallography. This, coupled with the fact that not all protein kinases are amenable to crystallography, means that there will always be a need to generate 3D information using alternative methods. The uncertainties of using the homology model are then often outweighed by the desire to have a degree of structural insight into the development of hit compounds.

0 0

Post a comment