This page describes the tautomerization models used in the JChem tautomer search:
The JChem tautomer search makes the decision if a query and a target molecules are tautomers of each other. It can use two tautomerization models for this: the generic and the normal canonical tautomerization.
To decide the tautomer equivalence, the search algorithm first generates the relevant tautomer forms of the query and the target. Then it makes a graph equivalence check for the generated tautomers. If the two generated tautomer forms are identical , the search considers the query and the target molecules as tautomers.
The following description gives an overview on the generic and normal canonical tautomerization.
The generic tautomer represents all theoretically possible tautomer forms of the input molecule. It is generated based on the following algorithm:
The identified regions are converted into a molecular representation by
replacing the original bonds within the region with ANY bonds and
attaching the number of bonding electrons, the number of D and T atoms in the region as data string to the region.
The output of this generation process is the generic tautomer form of the input molecule showing the identified distinct tautomer regions.
The normal canonical form (compared to the generic) represents a subset of all possible tautomers of the input structure.
The normal canonical forms are generated based on the following algorithm:
All possible H-bond donor and acceptor atoms in the molecule are identified.
These atom sets are filtered using the Maximal Allowed Length of Tautomerization Paths option (default value is 4) AND the built-in tautomerization rules coming from the normal canonical tautomerization model (e.g. aromaticity protection). This step results in narrower sets of donor and acceptor atoms taking part in tautomerization.
All possible tautomer forms are generated using these new donor and acceptor atom sets.
One final normal canonical form is selected from the generated forms using a scoring function.
The output of this generation process is the normal canonical form of the molecule.
The following examples show how the generic and normal canonical tautomerization behave in the cases of the 5 most common tautomerization types.
Molecules | Generic tautomers | Normal canonical tautomers |
---|---|---|
Molecules | Generic tautomers | Normal canonical tautomers |
---|---|---|
Molecules | Generic tautomers | Normal canonical tautomers |
---|---|---|
Molecules | Generic tautomers | Normal canonical tautomers |
---|---|---|
In the case of the nitroso-oxime tautomerization the generated generic tautomer forms are the same, while the normal canonical tautomers are different. This shows that both forms are stable and exist in water as distinct compounds.
Molecules | Generic tautomers | Normal canonical tautomers |
---|---|---|
The following examples show molecule pairs for which the generic forms are identical but the normal canonical forms are different. This shows that while the generic tautomerization model considers the two forms as a tautomer pair, the normal canonical model does not. This means that the two molecules can be considered as distinct molecules.
Molecules | Generic tautomers | Normal canonical tautomers |
---|---|---|
The generic tautomer generation was measures to be 5x faster than the normal canonical generation. These minor speed tests were run on a MacBook Pro (2.7 GHz Intel Core i5, 8GB DDR3).
$ time cxcalc -N ih generictautomer nci_rnd_1000.smiles >nci_rnd_1000_generic.smiles
real 0m5.225s
user 0m12.194s
sys 0m0.573s
$ time cxcalc -N ih canonicaltautomer --normal nci_rnd_1000.smiles >nci_rnd_1000_n_canonical.smiles
real 0m25.303s
user 1m9.342s
sys 0m1.683s