2 Reading and Saving Molecules
Molecules are normally read from files with the following formats:
They may also be read from:
Molecules and their associated field data may be saved to SMILES, PDB or SD files, or to comma-separated variable (CSV) files for importing into other packages. Simple lists of molecules may be saved in listing files.
THINK will automatically deduce the file format required from the extension supplied as part of the file specification:
File specifications will be taken exactly as supplied by the user to ensure that they are processed correctly on case-sensitive operating systems (eg Linux). However, in THINK v1.25 the file explorer (see below) converts all file specifications to lowercase. File names that include spaces may be used, providing they are enclosed in double quotes "".
Molecules are usually read through the OPEN command, the File > Open dialog, or the file explorer (see below). Under Windows files may also be dragged from the Windows Explorer and dropped in the Console window. The OPEN command provides the maximum flexibility - the other routes read all the molecules in the file in a single operation.
Atoms are assigned atom types, serial and groups numbers, names etc when the file is processed. If the file does not contain 2D or 3D coordinates THINK will attempt to generate these when required. Only protein atoms are organised into residues.
PDB files normally contain a single protein or peptide, possibly with a docked ligand. However, THINK is capable of reading several molecules from a PDB file, using the NAME record to indicate the start of each molecule. This is an extension to the normal PDB format, and is described in the THINK Theory manual.
When using the OPEN command to read SMILES, PDB or SD files, the user has the option to read selected molecules from the file through the MOLECULE keyword, using the molecule names or positions within the file to identify the desired molecules. The file is deemed to contain molecule names if it has one of the following formats:
| c1ccccc1 ... NAME=BENZENE ... |
| c1ccccc1 BENZENE ... |
If the file does not contain molecule names then the molecule's position within the file must be used to identify each desired molecule, and only one molecule may be read in each operation. The molecule position is specified through the keyword construct "MOLECULE=#n" where n is 1 for the first molecule, 2 for the second, etc. If the file contains molecule names then either the name ("MOLECULE=name") or the position may be used to identify the molecule. Several molecules may be read in a single operation if name includes wildcard characters.
| Commands | Dialogs | |
| OPEN FILE=dopamine.smi | ![]() |
File > Open |
| OPEN FILE=capsaicin.smi MOLECULE=#5 | Selective read not supported | |
| OPEN FILE=dopamine.sdf MOLECULE=DOPAMINE(1) | Selective read not supported | |
| OPEN FILE=dopamine.sdf MOLECULE=DOPAMINE(1%) | Selective read not supported |
By default, THINK will automatically add hydrogens to all molecules read from SMILES and SD files to complete their valencies, and will not add hydrogens to molecules read from PDB files. The OPEN command allows the user to override the default setting through the "OPTIONS=NOHYDROGENS" and "OPTIONS=HYDROGENS" keywords respectively. When automatic hydrogen addition is suppressed, only hydrogen atoms that are explicitly included in the file will appear in the molecule. Any hydrogens that would normally be added as a result of THINK interpreting elements of the form [CH] or [CHn] in SMILES files or interpreting the hydrogen-count field in SD files are omitted.
| Commands | Dialogs | |
| OPEN MOLECULE=capsaicin.smi MOLECULE=#1 OPTIONS=NOHYDROGENS | Supressing H addition not supported |
The capability to suppress hydrogen addition is important when reading queries for substructure searching.
Molecules may be saved in SMILES, PDB or SD files, or in CSV files which can be thought of as a form of SMILES file since they use a SMILES string to represent the molecule. If the exact 2D or 3D coordinates of the molecule are important then a PDB or SD file must be used, since SMILES files only contain information about the elements (or atom types) and bonding within the molecule. Normally SMILES files would be used for 2D molecules, SD files for 3D molecules and PDB files for peptides or proteins.
Molecules may be saved via the SAVE command or the File Save As dialog. The SAVE command allows specified molecules to be saved using wildcards and/or a comma separated list of molecule names while the dialog saves all the molecules that are currently loaded within THINK. If the user wishes to generate and save conformers to a file then the SAVE command must be used since this option is not available through the dialog or the file explorer.
The SAVE command takes an optional FORMAT keyword, which can be used to override the file format deduced from the file extension. If the user wishes to generate and save conformers the "FORMAT=CONFORMERS" keyword must be specified. Molecules may be saved selectively by issuing the "MOLECULE=name" keyword, where name may include wildcards. By default any field information associated with the molecules will be written to the SMILES or SD file; this may be suppressed through the "OPTION=NOFIELDS" keyword.
THINK provides the option to reduce the number of molecules saved to a file by omitting those that contain undesirable substructures or property values. This option would normally be used when saving the enumerated molecules from a combinatorial chemistry library (see Chapter 12), but may be used when saving any set of molecules. The substructure and property value filters are taken from a learn file created by an earlier data analysis calculation (see Chapter 9). The name of the learn file is automatically taken from the name of the field that contains activity data (set through the CUSTOMISE command using the ACTIVITY keyword). If the activity field is not set then the file "default.lrn" in the THINK_EXEC directory will be used.
| Commands | Dialogs | |
| SAVE FILE=dopamine.sdf | ![]() |
File > Save |
| SAVE FORMAT=CONFORMERS FILE=dopamine.sdf | Conformer save not supported | |
| SAVE FILE=dopamines.sdf MOLECULES=S1,S2,S6,S7* | Specific save not supported | |
| SAVE FILE=dopamines.sdf MOLECULES=@SELECTED | Use popup menu in spreadsheet or tile display | |
| CUSTOMISE ACTIVITY=LOGK SAVE FILE=LIB1.SMI OPTIONS=FILTER,NOFIELDS |
Options not supported |
The file explorer contains a list of all SMILES, PDB and SD files that have been opened by THINK. It maintains a hierarchy of these files so that the output from searches are located under the file which was searched with a default name derived from the query. If the user wishes to read a new file using the explorer, it must first be added to the list of recognised files by using the Add button. Once the file is visible in the list it can be opened by double-picking the file, or by picking Open from the pop-up menu displayed by the right mouse button.
A file may be "closed" when all the molecules that have been read from that file automatically deleted from THINK (but not from the file on disk), by picking Close from the right mouse button pop-up menu.
| Commands | Dialogs | |
| OPEN FILE=dopamine.smi | ![]() |
Select filename then Open |
| CLOSE FILE=dopamine | ![]() |
Select filename then Close |
Note: In THINK v1.25 the file path and extension is omitted from the close command.