The Nth family of DNA glycosylases may be divided into four subgroups by substrate specificity: Nth, Pdg, MutY and Tdg (see above). Although the overall structures of these enzymes are similar, their substrate specificities are quite different. From two subgroups previously considered for the Nth family, namely Nth and MutY (Zharkov and Grollman 2002), I shall now extend this analysis to all subgroups of the Nth family.
The following 11 archaeal and 44 bacterial genomes were searched by BLAST in the NCBI microbial genome database: Aeropyrumpernix, Sulfolobus solfataricus, Pyrobaculum aerophilum,Archaeoglobus fulgidus, Halobacterium sp. NRC-1, Methanothermobacter thermautotrophicus, Methanocaldococcus jannaschii, Methanopyrus kandleri AV19, Methanosarcina mazei Goe1, Pyro-coccusfuriosus DSM 3638, Thermoplasma volcanium, Mycobacterium tuberculosis H37Rv, Streptomyces coelicolor A3(2), Aquifex aeolicus, Chlorobium tepidum TLS, Chlamydia trachomatis, Chlamydophila pneumoniae CWL029, Nostoc sp. PCC 7120, Synechocystis sp. PCC 6803, Bacillus subtilis, Clostridium perfringens, Enterococcus faecium, Mycoplasma pneumoniae, Ureaplasma ure-alyticum, Lactococcus lactis subsp. lactis, Listeria innocua, Thermoanaerobac-ter tengcongensis, Staphylococcus aureus subsp. aureus N315, Streptococcus pyogenes M1 GAS, Fusobacterium nucleatum subsp. nucleatum ATCC 25586, Magnetococcus sp. MC-1, Caulobacter crescentus CB15, Agrobacterium tumefa-ciens str. C58 (U. Washington), Mesorhizobium loti, Rhodobacter sphaeroides, Rickettsia prowazekii, Ralstonia solanacearum, Neisseria meningitidis Z2491, Nitrosomonas europaea, Campylobacter jejuni, Helicobacter pylori 26695, Escherichia coli K12, Yersinia pestis, Buchnera aphidicola str. Sg, Vibrio cholerae, Xanthomonas campestris pv. campestris str.ATCC 33913, Xylella fastidiosa 9a5 c,Haemophilus influenzae Rd,Pasteurella multocida, Pseudomonas aeruginosa, Salmonella typhimurium LT2, Borrelia burgdorferi, Treponema pallidum, Thermotoga maritima, Deinococcus radiodurans. These genomes represent 46 phylogenetic groups, with no more than two genomes per group; four genomes were unfinished at the time of our analysis. Sequences of Eco-
Nth, Mlu-Pdg, Eco-MutY, and Mth-Tdg were used as queries and 100 top-scoring sequences for each query were pooled. All sequences were aligned and the tree was constructed by the neighbor-joining method. Sequences outside the root common for Eco-Nth, Mlu-Pdg, Eco-MutY, and Mth-Tdg were discarded. Remaining sequences were re-aligned and classified into one of the four subgroups according to rooting with the closest query sequence. The manual sequence qualification step was omitted. Physicochemical properties of residues in the aligned sequences were analyzed by AMAS (Cn=7,10 % atypical residues allowed, no gaps ignored, cysteines considered reduced). X-ray crys-tallographic structures of Nth (2ABK; Thayer et al. 1995),MutY (1MUY; Guan et al. 1998), and Tdg (1KEA; Mol et al. 2002) were used for mapping.
The BLAST search in 55 microbial genomes recovered a total of 103 sequences similar to the four query sequences, including the Mlu-Pdg and Mth-Tdg sequences, although genomes of the respective species were not searched. As 100 top-scoring sequences were taken from each search, the small number of sequences in the pool reveals that the Nth family is well conserved and shares little similarity with other sequences. Each query produced many sequences from other subgroups, e.g., the Eco-Nth query identified the Eco-MutY sequence as a homologue. Following classification and qualification steps, 80 sequences belonging to 38 phylogenetic lineages remained in the analysis. The Nth subgroup included 33 sequences of 29 phylogenetic lineages; the Pdg subgroup, 8 sequences of 6 lineages; the MutY subgroup, 34 sequences of 28 lineages; and the Tdg subgroup, 5 sequences of 4 lineages. Visual inspection of the alignment for sequence length and subgroup-specific conserved motifs confirmed the correctness of the group composition. Interestingly, the Tdg subgroup included archaean sequences only.
In general, Nth proteins grouped with Pdg proteins and MutY proteins with Tdg proteins formed two subgroup pairs. Many positions are conserved in Nth and Pdg and in MutY and Tdg, but not between these two pairs. Positions conserved in one protein and in either member of the other pair are rare. As Nth and Pdg participate in the repair of damaged bases,while MutY and Tdg repair mismatched bases, such groupings may reflect either functional differences between these enzymes, or their evolutionary relationships. The latter possibility appears less likely because of the exclusively archaean origin of Tdg.
Representative enzymes from three of the four subgroups (excluding Pdg) have been crystallized and their three-dimensional structures determined by X-ray diffraction methods (Kuo et al. 1992; Thayer et al. 1995; Guan et al. 1998; Mol et al. 2002). Enzymes of the Nth family bound to their cognate DNA have not yet been structurally analyzed1, and their DNA-binding site and active site
1 After this manuscript was completed, a structure of Nth covalently bound to DNA was published (Fromme and Verdine 2003); since the structures of DNA-bound MutY and Tdg are not available, the structure of free Nth was nevertheless used for illustrative purposes here.
are inferred from biochemical and mutagenesis evidence.Analysis of residues conserved across all subgroups of the Nth family were conducted previously (Zharkov and Grollman 2002) and the results did not differ significantly when the present data set was included. Residues that appeared to be specific for the Nth, MutY, or Tdg subgroup were mapped on the appropriate structure. Mapping serves as a useful visualization tool and helps in the understanding of the possible roles of the subgroup-specific amino acids (Zharkov and Grollman 2002). Residues were considered specific for a subgroup if (1) they were conserved (Cn>7) within the subgroup and (2) they were not conserved in the other subgroup of the same pair. Most residues fulfilling these two criteria were not conserved in any of the three remaining subgroups.
Proteins in the Eco-Nth, Eco-MutY, and Mth-Tdg subgroups contain two lobes separated by a positively charged interdomain cleft, where DNA is presumably bound. A deep pocket opens into the bottom of this cleft, containing residues important for the enzyme's catalytic activity. The groove usually has well-defined rims or "lips." The inferred mechanism of action for enzymes in the Nth family postulates that damaged DNA is bound into the enzyme's cleft and kinked at the site of the lesion. The base to be excised is then extruded (flipped out) of the double helix and inserted into the enzyme's active site pocket,where a series of chemical reactions take place (McCullough et al. 1999).
Mapping of subgroup-specific residues reveals them to be slightly more scattered across the enzyme globule as compared with the previous analysis of Nth and MutY (Zharkov and Grollman 2002). This is especially evident in Tdg, likely due to the small sample size. Nevertheless, many subgroup-specific residues clearly cluster on the lips of the interdomain groove and in the active-site pocket.
In the Eco-Nth (Fig. 4A), the highly conserved E23 and Y185 residues close the far-left part of the groove and form a hydrogen bond between the Y185 hydroxyl and one of the E23 Oe atoms (Kuo et al. 1992). (Orientation here and elsewhere is given with the six-barrel domain pointing upward). H176 and H177 form the bottom of the active-site pocket.
In Eco-MutY (Fig. 4B), R19 closes the far-left part of the groove. G139 and A189 are located on the lower lip and Y82 is on the upper lip of the groove. Activity of the Y82C Eco-MutY mutant is severely compromised, and a mutation converting the corresponding tyrosine of a human MutY homologue into a cysteine is associated with familial adenomatous polyposis (Al-Tassan et al. 2002). The entrance into the active-site pocket is occupied by Q41, a residue that likely interacts with the base opposite A, thus directly contributing to MutY specificity (Guan et al. 1998). Deeper in the pocket, one finds E37 and A124. An E37S mutation completely inactivates the enzyme, and it has been proposed that this residue forms hydrogen bonds with the N7 and N6 of the adenine to be excised (Guan et al. 1998).
Finally, Mth-Tdg (Fig. 4C) does not contain subgroup-specific residues in the postulated active-site pocket. However, R46 R47, and L87 cluster on the
Fig. 4A-C. Structures of Eco-Nth, Eco-MutY, and Mth-Tdg with the subgroup-specific residues mapped on their surface. A Eco-Nth; B Eco-MutY; C Mth-Tdg. The protein molecules are oriented so that the six-helix barrel domain points upward, and the DNA-binding groove faces the reader. Selected subgroup-specific residues discussed in the text are highlighted in blue and labeled. The catalytic dyad 120/138 is shown in green for easier orientation
Fig. 4A-C. Structures of Eco-Nth, Eco-MutY, and Mth-Tdg with the subgroup-specific residues mapped on their surface. A Eco-Nth; B Eco-MutY; C Mth-Tdg. The protein molecules are oriented so that the six-helix barrel domain points upward, and the DNA-binding groove faces the reader. Selected subgroup-specific residues discussed in the text are highlighted in blue and labeled. The catalytic dyad 120/138 is shown in green for easier orientation bo
Ln upper lip of the DNA-binding groove. It is suggested that R47 is inserted into the DNA double helix and assists in base flipping (Mol et al. 2002); mutation of this residue to alanine reduces enzymatic activity 20-fold.
In some cases, residues identified as being specific for Nth and MutY subgroups differ from the Nth- or MutY-specific residues identified earlier (Zharkov and Grollman 2002). This situation results from improvements in separation of enzyme groups with different substrate specificities and in sequence selection based on microbial genome search rather than on predefined clusters of orthologous groups (Tatusov et al. 1997). For example, Pdg enzymes are often annotated as Nth. Consequently, Pdg is one of three "endonuclease III proteins" found in the D. radiodurans genome and included as such in the Nth COG of the Clusters of Orthologous Groups Database (Tatusov et al. 2001). The two others relate less to Nth or Pdg than Nth and Pdg relate to each other. This ambiguous annotation leads to an artificial decrease in conservation of positions that are truly specific for Nth and thus hinders their identification. The present approach allows for better resolution of residues important for the specific function of each subgroup.
The currently accepted mechanism by which DNA glycosylases search for and "recognize" cognate lesions includes several steps where an enzyme could exert substrate specificity. In the initial encounter, the enzyme binds non-specifically to DNA and moves along one or the other groove by facilitated one-dimensional diffusion (von Hippel and Berg 1989) until the lesion is encountered. Recognition of the lesion is accomplished through the action of a "reading head," a part of the enzyme directly involved in scanning DNA. The damaged base is then everted from the helix and stabilized through interactions in the active-site pocket. Interactions with the reading head and the binding pocket are likely to be different as, in some DNA glycosylases, canonical bases may not fit the pocket (Kavli et al. 1996). Residues located at the edges of the DNA-binding groove of Nth, MutY, or Tdg are good candidates for reading-head groups. These residues are often bulky and capable of intercalation between base pairs in the DNA duplex, as in the prototypical tyro-sine/arginine reading head of uracil-DNA glycosylase (Parikh et al. 1998). Alternatively, residues positioned on the edge may detect atypical patterns of hydrogen bond donors and acceptors exposed in the major or minor groove of DNA, as proposed for E. coli Fpg protein (Grollman et al. 1994). Residues located within the active-site pocket likely stabilize the everted base through formation of specific hydrogen bonds. Interestingly, some glycosylases, such as Tdg or alkylpurine-DNA glycosylase AlkA (Zharkov and Grollman 2002), contain no specific amino acids in the active-site pocket. These enzymes may rely on nonspecific van der Waals contacts to stabilize the everted base (Labahn et al. 1996). Alternatively, the enzymes may form hydrogen bonds with amino acids that are not unique to the subgroup (Mol et al. 2002), in which case substrate specificity would most likely occur during the scanning step. Residues from both classes, identified by a combined structural and bioinformatics approach, are primary candidates for site-directed mutagene-sis studies designed to clarify their roles in determining substrate specificity.
Acknowledgements. I am grateful to Dr. Arthur Grollman for numerous discussions regarding the mechanism of DNA glycosylases, and to Annette Oestreicher for help in editing the manuscript. Analyses conducted while the author was in the Laboratory of Chemical Biology of the State University of New York, Stony Brook, New York, were supported by grant 47995 from the National Cancer Institute. He is currently supported by grants from the Wellcome Trust (UK), the Russian Foundation for Basic Research (02-04-49605) and the Russian Ministry of Education (PD02-1.4-469).
Was this article helpful?