This manual gives you a walk-through on how to use the Screen API:
This guide gives examples of using Chemaxon's Virtual Screening toolkit API. With the help of these examples experienced programmers can develop their own screening software including the generation of various molecular descriptors, dissimilarity calculations and virtual screening. Besides, users can implement custom molecular descriptors and integrate them into a virtual screening environment.
The Screen package provides tools and components for ligand based virtual screening. Ligands (i.e. small molecules) are transformed to molecular descriptors which are series (vector) of bit, integer or floating point values. Most descriptors are based on the topology of the small molecule though 3D descriptors incorporating one or more conformations of the molecule can also be introduced.
Similarity metrics are applied to descriptors to compare them in similarity calculations. Metrics include Tanimoto and Euclidean and their variants (see ScreenMD user's manual for detailed description).
The API of the Screen package provides high level, easy to use interface to all types of molecular descriptors provided by Chemaxon. The interface is uniform, so typical API methods do not distinguish between different descriptor types. Thus most examples can easily be modified for other descriptors, too.
The first example demostrates how a topological descriptor, the Chemaxon Chemical Fingerprint can be assigned to each individual molecule read from an SDFile. The result of this program is a descriptor file that contains one molecular descriptor per line. The name of the input to process and the output file to be created are given in the command line.
The next example is more complex as it demonstrates two different aspects of virtual screening at the same time. The key point here is that molecular descriptors are used for dissimilarity calculation. Besides, descriptors are taken from a JChem database table. Such table, the so called descriptor table can be generated prior to calling this sample program by running GenerateMD with the appropriate parameters.
The program solves a rather simple but demonstrative task: calculates the average dissimilarity of a given query structure and all structures stored in the database (with respect to the particular descriptor type used and dissimilarity metric applied). This code can simply be expanded to calculate the total dissimilarity score of a compound library.
More advanced usage of the Screen API includes the simultaneous use of several descriptors, the use of the Metrics class and the fine tuning of dissimilarity metrics.
The Screen package provides a framework for the descriptor/fingerprint generation, storage and retrieval, for similarity/dissimilarity calculations, for virtual screening and for the fine-tuning of dissimilarity scoring functions. As a framework, it does not limit the applicability of tools to the pre-existing molecular descriptors and dissimilarity metrics delivered by Chemaxon. The user can implement custom descriptors that can be integrated in the Screen system in a plug-and-play fashion.
The sample code in this section illustrates how custom molecular descriptors can be implemented using Chemaxon's technology. The example is a partial implementation of the 166 public MDL keys (MACCS). It has to be noted, that for the sake of easy understanding efficiency was not targeted in this program. A 'real life' application should take more care about faster and parallel recognition of functional groups for the sake of fast operation.
When generation custom descriptors in the Screen framework, 3 java classes have to be implemented:
the generator class, derived from the MDGenerator class,
the descriptor parameter class, derived from the MDParameter class,
the molecular descriptor class, derived from the MolecularDescriptor class.
Convenience classes have also been introduced to alleviate the coding work. Examples below derive the MACCS descriptor class as well as the corresponding parameter class from these convenience classes. These classes suit most typical needs, it is seldom needed to inherit from lower level classes.
The main function of the descriptor generator class is to assign a molecular descriptor to the given input molecule. Beside its constructor, the only method to be implemented is generate() . This has two parameters, the input Molecule and the output MolecularDescriptor generated.
Note that the return value, a String array does not store the descriptor. Instead, it contains the names of the properties optionally set for the input molecule. These can include partial results of the descriptor calculation that are believed to be useful and thus kept for later use. The return value is optional, most descriptor generators return null . However, if properties are set by the generator, then those are written in the output SDFile if an SDF output was specified (e.g. in G enerateMD ). This feature can be used for testing purposes.
Most molecular descriptors can be parameterized, for instance the length is a fairly common parameter. The parameter class also introduces the metrics that are compatible with (available for) the new descriptor.
Descriptor parameters are stored in an XML file that can easily be extended according to future needs. However, compatibility with old versions has to be maintained.
The convenience class CDParameters (where CD stands for Custom Descriptor) covers almost all typical functionality needed to handle parameters, thus in most cases the parameter class is simply a wrapper for methods delegated by the CDParameters class.
The main purpose of the descriptor class is to provide the connections for the plug-and-play interface, via its constructors and some miscellaneous methods like getName(). This example code illustrates the use of binary fingerprint like descriptors, however, integer vector or floating point vector type descriptors can be implemented the same way (with the appropriate obvious changes).
If, however, the descriptor to be implemented is neither a binary fingerprint, nor and integer/float vector like descriptor, then the convenience classes cannot be used. In these are rather rare cases the implemetor of the new descriptor has lot more coding work to do.
Users are encouraged to contribute their custom descriptor implementations to our public discussion forum, see for instance Florian Pitschi's work.