The protein molecules in crystals pack quite loosely (Fig. 1), leaving large intervening spaces that are filled with solvent. Solvent content varies from a minimum of around 30% by volume for some small proteins, to a maximum as high as 80%, with an average value of about 50% . Contacts between the protein molecules tend to be few and rather tenuous; for example, in actinidin, a protein of 220 amino acid residues and moderate solvent content (42%), only 20 residues make direct contacts of less than 4 A with neighboring molecules in the crystal . Additional contacts are via water bridges, which are a consistent feature of all protein crystal structures. The larger solvent regions in turn have most of the properties of bulk solvent, being in free and rapid equilibrium with the outside mother liquor.
These characteristics of protein crystals, high solvent content, and loose packing of the protein molecules have a number of important consequences:
1. The crystals tend to be soft, fragile, and easily disordered.
2. Small molecules (inhibitors, substrates, etc.) can be diffused into protein crystals via the solvent channels. Providing the active site is not blocked by a neighboring molecule, enzymes therefore tend to be active in the crystalline state, as in solution , and many binding studies have been carried out using protein crystals.
3. Very little of the protein structure is likely to be influenced by its crystal environment, and there is ample evidence that protein structures in the crystal are essentially the same as in solution [5,6]. Solution properties are consistently explained by crystal structures. Comparisons of the same or related proteins in different crystal environments show very high levels of similarity; for example, the homologous cysteine proteases actinidin and papain show an rms (root mean square) deviation of only 0.4 A for 90% of main-chain atoms despite very differ-
ent crystallization conditions (20% ammonium sulfate, pH 6.0, and 62% methanol, pH 9.3, respectively) and different crystal packing . Most recently, comparisons of structures determined in solution by NMR (nuclear magnetic resonance) spectroscopy have shown generally good correspondence with crystal structures [8-11]. The main differences arise because flexible side chains or loops can be "frozen" into one of several accessible conformations by their crystal-packing environment.
4. Of greatest relevance to protein-solvent interactions, most of the protein surface is usually free of crystal packing contacts and in contact with what is essentially bulk solution. Solvent molecules bound to these regions must experience a very similar environment to that in free solution. Only in intermolecular contact areas are significant constraints imposed by the crystal environment, and the extent of these areas depends on the solvent content of the crystal. For example, in human lysozyme crystals, with a solvent content of 37%, crystal packing contacts involve 38% of the surface, whereas for turkey egg white lysozyme (solvent content 51%), the fraction covered by crystal contacts is only 13% ,
The major general difference between the crystal and solution is the extent to which the flexibility of protein structures is restricted. The crystal lattice does allow some flexibility of amino acid side chains, of loops, and even of whole domains , but generally there is a reduced mobility compared with solution ,
III. CRYSTALLOGRAPHIC LOCATION OF SOLVENT A. The Crystallographic Method
X-ray crystallography allows one to actually "see" the atomic and molecular structure in a crystal. The image that is seen, however, is a three-dimensional map of the distribution of electrons in the crystal, that is, an electron density map. It is the interpretation of the electron density map, in terms of atoms and groups of atoms, that constitutes the model that is finally published or deposited. It is outside the scope of this chapter to discuss crystallographic theory (for comprehensive accounts, see Refs. 6,14), but an understanding of certain features of the crystallographic method is essential to appreciate the quality and nature of the information it gives about solvent structure in protein crystals.
First, calculation of an electron density map requires two pieces of information, that is, the amplitudes and the phases, for all of the diffracted x-ray waves. The amplitudes can be measured, with an accuracy of ~5%, but the phases are very difficult to determine accurately, especially for high-resolution data. Thus, the electron density map from which an initial model of the protein structure is derived inevitably contains errors and ambiguities because of the errors in the phases. This is less of a problem for interpreting the protein part of the structure than it is for the solvent, because the protein atoms are all covalently connected in the polypeptide chain. Even though ambiguities may sometimes make chain tracing difficult, the model must still conform to a known chemical connectivity. For the solvent, however, no such restriction exists; solvent molecules are small, discrete, and not covalently linked one to another. Identification of solvent sites in such an initial map would be too unreliable to be worth attempting.
The process of crystallographic refinement improves the situation greatly, however. The current model is used to calculate amplitudes and phases. The agreement of the calculated amplitudes with the experimentally measured observed amplitudes gives a measure of the correctness of the model (expressed as the crystallographic R factor, which should be less than 0.20 for a well-refined protein structure—see Refs. 6,16). Moreover, the calculated phases can be used, either on their own or in combination with the experimentally estimated phases, to give clearer electron density maps with less "noise." As the model is refined, by least squares methods, by energy minimization, or by rebuilding from electron density maps, the phases become better and so does the quality of the maps. There are still hazards in these procedures (for example, incorrectly placed atoms in the model lead to bias in the phases, which can cause the creation of false density). There is no doubt, however, that refinement greatly enhances the reliability with which solvent peaks can be picked out from the noise.
The other major factor is the resolution of the x-ray analysis. If only the inner (low scattering angle) parts of the diffraction pattern are used, a low-resolution image (electron density map) is obtained. As the outer (higher angle) data are incorporated, the resolution is increased, giving clearer definition of structural features. Thus, at low resolution (~6 A), helices appear as solid rods; at medium resolution (~3 A), side chains generally have recognizable shapes and peptide car-bonyl oxygens appear as "bumps" projecting from the polypeptide chain density; while at high resolution (2 A or better), much finer detail becomes apparent . The resolution possible is ultimately limited by the quality of the crystals. If all of the molecules in the crystal do not have exactly the same orientation, or groups (e.g., external side chains) have a variety of conformations, the result is a blurring of the image and a loss of resolution.
As implied in the foregoing, the main difficulty in reliably identifying solvent molecules in a protein structure analysis is the need to distinguish genuine solvent peaks from the noise of an electron density map. This problem is particularly acute in the early stages of an analysis or where the resolution is limited. Most workers therefore use a conservative approach, adding solvent molecules in stages, following fairly strict criteria.
Two types of electron density map are commonly used to locate solvent. If a map is calculated with coefficients F„ - Fc (where F„ is the observed structure amplitude of a scattered x-ray wave, and Fc is that calculated from the current model), a so-called difference map is obtained. Where the model is correct, no density is seen (F0 and Fc cancel out); where a feature should be included in the model but is not, a positive peak is seen (because it contributes to F0 but has not been included in the calculation of Fc); where a feature is erroneously included in the model, a negative peak is seen (because it is contributing to Fc but not F0). Difference electron density maps are thus very sensitive to features such as solvent molecules that have not yet been included in the structural model. The other kind of map frequently used in protein structure refinement employs coefficients 2F„ - Fc. This effectively combines a difference map (coefficients F„ — Fc) with a map with coefficients F„\ density is present for the whole structure (contributing to F0) but with errors emphasized through the inclusion of the F0 - Fc term. Examples of such maps are shown in Fig. 2.
Solvent molecules are seldom included in a model unless the resolution is better than 2.5 A, and it is very important not to model solvent too early in the refinement of a protein structure. This is because if solvent molecules are erroneously placed in density that really belongs to part of the protein, not only will the solvent be wrongly placed, but the protein will tend to be "locked in" to this incorrect structure for the subsequent refinement. For this reason, the solvent is usually not added until the protein structure has been well defined, typically at a crystallographic R factor of 0.25 or lower. An example of the improvement of solvent peaks during refinement is shown in Fig. 3.
The best-defined solvent molecules are included first; this usually means those in internal sites or surface crevices and pockets. Electron density peaks greater than a certain threshold, for example, three times the rms density of a difference electron density map, are identified, either by manual scrutiny of the map or using computer programs to search it for the highest peaks (e.g., ). These peaks are then examined to see whether they fulfill various criteria; for example, that they are within hydrogen-bonding distance of potential hydrogen-bonding groups, in appropriate geometric orientation; they are not too close (<3.0 A) to nonhydrogen-bonding atoms; and they are not close to any part of the protein whose conformation is in doubt . Often peaks are only assigned to solvent molecules if they have been noted as persisting through several phases of the refinement. It must be stressed, however, that criteria for including solvent vary greatly from one laboratory to another.
As the refinement proceeds, and the model is further improved, more weakly bound, less well-defined solvent molecules can be added as the density becomes clearer. These are, by definition, closer to the noise level, however, and it is difficult to know where to draw the line. As a rule of thumb, a "reasonable" value for the number of solvent molecules in a high-resolution model is roughly equal to the number of residues in the protein. This should include all the strongly bound solvent, most of the first hydration shell, and a little second-shell solvent.
Was this article helpful?