Chapter 11


12    Combinatorial Chemistry

OPEN FILE=filename.smi SEARCH QUERY=amide-rx#1 SEARCH ..FILE=reagents.smi SEARCH .. SITE=1 SEARCH .. OUTPUT=gp1.smi SEARCH MODE=R-GROUP OPEN FILE=gp1.smi SAVE R-GROUPS=amide-rx,gp1.gp2 .. SAVE OPTION=FILTER SAVE FILE=amide-lib.smi

Combinatorial chemistry consists of two distinct steps:

R-groups are molecular fragments that can be plugged together to form larger molecules. These joins can only be made at specific locations within the fragments: the connection atoms. Each connection atom is a temporary atom with a special numeric atom type in the range 0-9. When two R-groups are joined, the connection atoms are removed and the remaining bonds are plugged together. The connections may be made using any type of bond (single, double, etc), but both connection atoms must have the same type of bond. If n is greater than 0 then that connection can only be joined to another with the same value, and is known as an explicit connection atom; a value of 0 indicates a generic connection that can be joined to a connection atom with any value.

R-group searches are analogous to the reagent searches within Chem-X. A maximum of 10 sets of R-groups, or 9 sets of R-groups and a set of core molecules, may be used in any library.

R-groups may be stored using the SMILES or SD format. Within the SMILES string each connection atom is shown as a single digit within "[]" brackets, "[n]" where n is the numeric atom type. In an SD file, each connection atom has an atom symbol of n, stored in columns 32-34 of the atom record.

When a library is enumerated, THINK permutes all combinations of the R-groups within the sets and saves the resulting molecules to a file. There are three main types of library that could be constructed; these differ in the types of connection atoms used in the R-groups:

These types of libraries are described in more detail in the THINK Theory Manual.

12.1   Read Query Molecule

The molecule to be used as the query during the R-group search is read from a SMILES file. It need not be the only molecule in the file - the user may either read the entire file and then select the desired molecule, or may selectively read just the query molecule from the file (see section 2.1). It is also possible to read the query from an SD file, but this is less common, and the format of the query is not described here (see the THINK Theory Manual for details).

The most convenient query often consists of a generic reaction with reactants including substitution positions and product(s) which form the core group for enumeration. A SMILES for creating an amide library is shown below together with a graphical representation.

[1]C(=O)Cl+[2]N(H)H>[1]C(=O)N(H)[2]

For most libraries there are several different generic reaction schemes which can have subtle but important differences in the reagents which are selected. THINK v1.25 does not include the capability to draw such queries.

The SMILES string for each reagent is known as an R-group query and consists of a substructure and the connection points. The substructure is the portion that is common to all reagents accepted by the search; this portion will be discarded when the reagent is converted into an R-group. The connection points are indicated by numeric atom types or wildcard atom types:

The [0] and [n] connections in the query will be matched to any atom type except hydrogen in the reagent. The [wildcard] connection will only be matched to selected atom types - see the THINK Theory Manual for a full list of supported wildcards. The [0] connection is not usually used in reactions but is a useful means of generating R-groups which can be re-used with in a variety of libraries.

It is also possible to store the R-group queries in separate molecules. For example, to build a coreless amide library of the form [R1]C(=O)N[R2] from acid chlorides and amines would require two R-group queries: one to look for acid chlorides and convert them into suitable R-groups, and the other to perform the same operation on amines. The reagents to be used, and the R-group query required to select them are listed in the table below:

Search Reagent Query
acid chloride [R1]C(Cl)=O [0]C(Cl)=O
amine N[R2] [0]N

If the query contains multiple connection points, THINK will use explicit connection atoms in the resulting R-groups, even though the the query may have contained generic connection points. Since the order in which explicit connections are allocated to generic connection points is undefined, it is recommended that explicit connections ([n]) are used in these queries.

CommandsDialogs
OPEN FILE=rx-amide.smi Show dialog File > Open

Note: Reactions and R-group queries should never have missing hydrogens as these cause serious problems.

12.2   Select Query Molecule

In addition to selecting the molecule which contains the R-group for reaction queries it is necessary to specify which R-group to use. The SITE keyword is used to specify any atom in the R-group (usually the connection atom by atom name ie without the square brackets).

CommandsDialogs
SEARCH ... QUERY=rx-amide#1 SITE=2 ... Show dialog Search > 2D > R-group; Site=2

12.3   Identify Reagent File to be Searched

Unlike many other software packages that require molecules to be converted to a proprietary format before they can be searched, THINK searches molecules directly from a SMILES or SD file. This eliminates the need to perform time-consuming data conversions.

CommandsDialogs
SEARCH ... FILE=aldrich.smi ... Show dialog Search > 2D > Browse

12.4   Set Search Options

An R-group search is a specialised form of 2D search (see Chapter 7) that checks connectivity and automatically converts matching molecules into R-groups. The user can choose whether atoms should be matched according to their element symbol or atom type and bond orders should normally be checked; these settings may be controlled through the OPTIONS=ORDER and OPTIONS=TYPE keywords respectively.

Many files of reagents store their molecules in the salt form and this will cause spurious results from R-group searches. The THINK R-group search incorporates the ability to convert the molecules into their parent forms through the OPTIONS=PARENT keyword. This conversion takes place before the reagent is compared with the query.

CommandsDialogs
SEARCH MODE=R-GROUP ... OPTIONS=ORDER,PARENT Show dialog Search > 2D > R-group; Order; Parent

If the FILTER option is used, the upper limit is applied to the resulting R-group (not to the original molecule), and the lower limit is ignored in order to avoid eliminating solutions that could be satisfied in enumerated molecules by other R-groups.

12.5   Supply R-Group File

The R-groups resulting from an R-group search are stored in an output file using the SMILES or SD format, depending upon the file extension supplied for the output file. Normally the SMILES format would be used because this is more compact that the SD format.

CommandsDialogs
SEARCH ... OUTPUT=gp2.smi ... Show dialog Search > 2D > Search

12.6   Perform Search

Once the query molecule, reagent and R-group files have been supplied, the user can initiate the search. Each reagent molecule is read from the file in turn and compared with the query; if it matches then it is converted into an R-group and saved to the R-group file. THINK automatically eliminates duplicate R-groups - this situation would occur if the reagent file contained several copies of the same molecule, for instance in different salt forms. THINK will also automatically ignore all molecules with multiple reaction sites since it would be impossible to predict which reaction site is used when physically making the library.

CommandsDialogs
SEARCH MODE=R-GROUP QUERY=RQUERY#1 FILE=aldrich.smi OUTPUT=gp2.smi OPTIONS=ORDER,PARENT Show dialog Search > 2D > R-group; ... Search

12.7   Read R-Groups

Once the R-groups for a desired library have been created, they can be used to create the complete molecules. To achieve this, the R-groups must first be read into THINK. The core group (if any) is treated as an R-group, and therefore must also be read into THINK. The generic reaction (which contains the core group as a product) may be used as the reactants will be ignored. Since there may be a large number of R-groups, it is recommended that all other molecules are first deleted to reduce the amount of memory required.

The R-group files may be read in any order. THINK will use all the R-groups loaded from each file to enumerate the library. If a subset of R-groups is required from a file, then either the required R-groups should be read selectively (see section 2.1), or the entire file should be read and the undesired R-groups deleted.

Note that all the R-groups to be used at any single position in the library must be read from the same file. This is because THINK uses the file name to identify the R-groups for each position in the enumerated molecules. The R-groups from a file may be used at more than one position in the library (eg when building peptides), providing exactly the same set of R-groups is used in each position.

CommandsDialogs
OPEN FILE=rx-amide.smi
OPEN FILE=g1.smi
Show dialog File > Open
DELETE MOLECULE=PRO@G1 Show dialog Edit > Delete
OPEN FILE=g2.smi MOLECULE=A*  Selective read not supported

12.8   Define R-Group Files

Before a library can be enumerated, THINK needs to know which R-groups are to be attached to each position. The R-groups are identified by the name of the file from which they were read (when any molecule is read, THINK maintains a record of the file from which it came). Only the file names are required (not the file extensions).

If the R-groups contain generic connection points ([0]) then it is important that the file names are supplied in the correct order, with the core file (if any) first. If all the R-groups contain explicit connection points ([1], [2], etc) then the files can be supplied in any order. If the same set of R-groups are to be used at several positions, for instance when building a peptide, then the file name must be supplied once for each position (as in the second example below).

CommandsDialogs
SAVE ... R-GROUPS=amide-rx,gp1,gp2 Show dialog File > Enumerate
SAVE ... R-GROUPS=ncap,amino,amino,amino,amino,cap

12.9   Apply Optional Filters

An enumerated library may contain many thousands (or millions) of molecules, many of which are of no further interest because they have undesirable properties. These molecules may be filtered out from the library by applying the criteria from a learn file. This file, created during an earlier data analysis calculation (see Chapter 9), contains a list of undesirable substructures and of desirable property values. When applied during library enumeration, molecules which contain any of the undesirable substructures or whose properties lie outside any of the desirable ranges are automatically discarded instead of being saved to the file. Note that these filters are applied to the enumerated molecules, not the constituent R-groups. The learn file is assumed to have the same name as the field that contains activity data (set through the ACTIVITY keyword in the CUSTOMISE command). If this field is not set then the learn file "default.lrn" in the THINK_EXEC directory will be used.

CommandsDialogs
CUSTOMISE ACTIVITY=LOGK   Not supported
SAVE ... OPTIONS=FILTER Show dialog File > Enumerate > Filter

12.10   Save Enumerated Molecules

The molecules within a library are enumerated by permuting the R-groups within the sets. Thus the total number of molecules generated is the product of the sizes of the sets, although this may be reduced by applying filters (see above). The enumerated molecules are saved to a file for subsequent use within THINK or other software packages. Because of the sheer number of molecules that may be created, it is recommended that the SMILES format is used since this is the most compact representation of the molecules.

THINK automatically constructs a temporary copy of each enumerated molecule, compares it with the filter criteria (if set) and saves it to the file before discarding it. The molecule is constructed by plugging together the R-groups according to a set of rules:

See the THINK Theory Manual for more information on the rules governing the conversion of generic connection atoms into explicit connection atoms.

CommandsDialogs
SAVE FILE=lib1.smi R-GROUPS=L1,G1,G2 OPTIONS=FILTER Show dialog File > Enumerate > ... Save

Appendix A