
The data analysis examines the properties and/or keys of a set of molecules and attempts to identify features that may be responsible for undesirable characteristics. The results of the analysis are stored in a learn file that can be used in subsequent THINK calculations to highlight or eliminate molecules whose properties lie outside the acceptable ranges or which contain unacceptable functional groups.
THINK will analyse all molecules that have been loaded into the program. They may be read from a SMILES or SD file. Any external property data required for the analysis must be included in the file in data fields - these fields will be loaded automatically as the molecules are read. The file must contain a field of activity data.
| Commands | Dialogs | |
| OPEN FILE=capsaicin.smi | ![]() |
File > Open |
The field containing the activity data must be identified.
| Commands | Dialogs | |
| LEARN ... ACTIVITY=EC50 ... | ![]() |
Calculate > Analysis > Field:EC50 |
By default, the activity values will be taken directly from the activity field, with the highest values indicating the most active molecules. Alternatively, the interpretation of activity may be reversed, so that molecules with low values are considered the most active. There is also an option to convert the activities into their logarithmic values, for instance if binding constants are used as the activity data.
The molecules need to be divided into active and inactive molecules for the analysis. This is done through a user-supplied significance value that must lie within the range 0-0.5. Molecules that lie within this fraction of the top or bottom of the activity range are considered active or inactive; molecules that lie in the middle are ignored during the analysis. If a significance value of 0.5 is specified then all molecules will be included in the calculation.
| Commands | Dialogs | |
| LEARN ... SIGNIFICANCE=0.3 ... | ![]() |
Calculate > Analysis > Significance=0.3 |
| LEARN ... OPTIONS=LOW,LOG ... | ![]() |
Calculate > Analysis |
9.4 Select Data to be Analysed
THINK can include some or all of the functional group keys, 2D properties and/or external data fields in the data analysis. The fields, 2D properties and keys are identified by keywords:
| Keyword | Interpretation | |
| FIELDS | All 2D properties and external data fields | |
| field_name | Specified external data field | |
| property_name | Specified 2D property. Valid names are: | |
| ATOMS | BONDS | |
| HETATOMS | HALOGENS | |
| DONORS |
ACCEPTORS | |
| POSITIVES | NEGATIVES | |
| BRANCHES | RINGS | |
| AROMATICS | HETAROMATIC | |
| CENTRES | MASS | |
| FLEXIBILITY | LIPOPHILICITY | |
| VOLUME | AREA | |
| PSA | NPSA | |
| PFA | NPFA | |
| XSA | XFA | |
| CPK-CONTACTS | VDW-CONTACTS | |
| ROT-BONDS | CONFORMERS | |
| E-TORSION | ||
| KEYS | All functional group keys | |
| KEY#n | Specified functional group key | |
| * | All functional group keys, 2D properties and external data fields | |
| <no keywords supplied> | All functional group keys, 2D properties and external data fields | |
See section 4.5 for a full list of the 2D properties.
| Commands | Dialogs | |
| LEARN ... PROPERTIES=KEYS ... | ![]() |
Calculate > Analysis > Keys |
| LEARN ... PROPERTIES=KEY#5,KEY#12,LOGP,DIPOLE,... | Full control is not supported |
During the analysis calculation, THINK first calculates any 2D properties and functional group keys that are required before attempting to extract discriminating features that could identify the inactive molecules. The results of an analysis calculation are saved in a learn file whose filename is taken from the name of the activity field and uses the file extension ".lrn". Thus, if the activities are taken from the field called EC50, the results will be stored in a file called "ec50.lrn". Amongst other information, the learn file contains the acceptable range of values for each discriminating property (2D property or external data field) and the most significant unacceptable functional groups.
| Commands | Dialogs | |
| LEARN ACTIVITY=EC50 SIGNIFICANCE=0.5 | ![]() |
Calculate > Analysis > .. Analyse |
| LEARN ACTIVITY=EC50 SIGNIFICANCE=0.3 PROPERTIES=FIELDS | ![]() |
Calculate > Analysis > .. Analyse |
The learn file generated by data analysis may be used to provide additional rejection criteria for de novo structure generation. It may also be used in the property spreadsheet to highlight values that lie outside the acceptable ranges.
| SUGGEST MOLECULE=M2 DISPLAY=PANEL ACTIVITY=EC50 |
| LIST INFO=PROPERTIES OUTPUT=WINDOW ACTIVITY=EC50 |