6 De Novo Structure Generation
The de novo structure generator starts with a specific molecule and generates analogues by applying transformations. Each resulting molecule is tested for property and substructure rejection criteria, and for similarity to active and inactive molecules prior to an annealing step which determines whether the molecule is saved in a file and used for the next iteration. This scheme is summarized below:

The transformations are stored in SMILES format by default in “tranform.smi” in the THINK_EXEC directory. One of these is selected at random for application to the molecule; if no substitution exists for this transformation then an alternative random selection is made. If the transformation could be applied at more than one location within the molecule then a site is selected at random. The transform number used is stored in the field TRANSFORM for the resulting molecule.
A user-defined transformation file may be used instead of the standard file. Each transformation is defined in a single record, using the SMILES format to describe the original and transformed substructures with a “>” as the separator: eg C(=O)C>C(=O)OC would describe a transformation that inserts an oxygen atom. Where necessary, hydrogen atoms must be explicitly specified within the original and transformed substructures.
The selection of transformations can be influenced by defining a probability (which is similar to a relative reaction rate) for each transformation in the file. The values are stored in a field called PROBABILITY. The chance of a particular transformation a being selected is Pa/Psum where Pa is the probability of transformation a and Psum is the sum of all the probabilities in the file.
6.2 Application of Earlier Results
When selections of active and inactive molecules are available, these may be used in two ways:
The file of molecules to be compared must contain activity data. By default, the file is expected to have the same filename as the activity field (as would normally be the case if the file contained results from an HTS assay). The corresponding learn file will be loaded if it has the same file name except for the .lrn extension. When no comparison file is specified, the file “default.lrn” from the THINK_EXEC directory will be applied. If the learn file cannot be opened then default ranges of heteroatoms (2:9), molecular mass (175:800) and lipophilicity (0.28:0.72) will be used.
A default rejection file is provided (“reject.smi” in the THINK_EXEC directory)
which contains some substructures that are known to be unstable, readily metabolised
or impossible to synthesize. This file uses the SMILES format. A user-defined
rejections file may be used instead. Each substructure must be defined in a separate
record, using the SMILES format. Comments may be included in the file providing
they are contained within records that begin with “!”.