The aim of this chapter was to introduce the reader to a representative set of information-rich, intuitive, and readily interpretable descriptor schemes that are applicable to homogeneous as well as heterogeneous datasets. Such a selection is, of course, subjective and, in addition, the interdisciplinary character of this book requires a limit on the presentation of the topic. This chapter will be concluded by a short summary of other important approaches, and apologies are offered for any omissions.
The reactivity of toxic agents concerning redox and both nucleophilic and electrophilic reactions is the realm of quantum-mechanical descriptors. Molecular orbital parameters (HOMO, LUMO energies) and partial charges are frequently encountered as the most relevant parameters in structure-toxicity correlations (36). Depending on the level of theory, the calculation of these parameters may be quite lengthy. However, on a semiempirical level, even large numbers of molecules can be routinely handled on present day computing platforms (17).
Structural fragments (37) or connectivity pathways (38,39) can be transformed into a fingerprint by binary encoding (1/0) the presence or absence of substructures. Alternatively a molecular hologram can be created by counting the occurrences of fragments up to a predefined length (40). A drawback is the huge number of possible fragments, i.e., descriptor columns, resulting in sparse matrices, since each molecule only populates a fraction of such a descriptor matrix. Therefore, either the fingerprints are folded back ("hashing") with each bin assigned to several fragments, or fragment descriptors are clustered into structurally similar groups.
For the characterization of 2D pharmacophoric patterns, atom pairs (41) and topological torsions (42) were introduced by scientists from Lederle. Atom pairs are defined as two atoms and the length of the shortest bond path between them. Typically all fragments with bond paths of 2-20 atoms are recorded and their occurrences are counted. Atoms may be assigned to pharmacophoric classes (H-bond donor or accep tor, hydrophobic, positively or negatively charged, other), and the distribution of pharmacophoric pairs is described with respect to pairwise distances (e.g., three instances of a donor-acceptor pair separated by five bonds). Topological torsion descriptors are the 2D analog of the torsion angle since they are computed from linear sequences of four consecutively bonded heavy atoms. The topological correlation of generalized atom types is also the conceptual basis of the program CATS (43) that generates molecular size-independent phar-macophore descriptors.
Summing up, there is an overwhelming number of molecular descriptors available and it is, of course, tempting to calculate as many descriptors as possible [e.g., with the popular Dragon package (44)] and to let a "black box'' program select the most informative variables during model generation. However, it is the responsibility of the QSAR practitioner to gain a thorough understanding of the biological effect to be mathematically modelled and to choose both a relevant descriptor set and a statistical procedure that allow one to identify and to rationally modify molecular features causing toxic liabilities.
Was this article helpful?