Appendix A
Appendix B Pharmacophore
File Format
The format used in pharmacophore files has been changed to a comma-separated-variable
(CSV) format. This means that pharmacophore data can be loaded directly from the
file into a relational database (such as MySQL or ORACLE). In the past, loading
the data would have required converting the file written by THINK into a format
that the relational database system could handle.
A pharmacophore file begins with two header records (records 1 and 2 below),
which are followed by the molecule data blocks. Each block starts with a molecule
header record, and then contains the pharmacophore data. Each molecule in the
file has a separate data block. All header records start with a "#"
character.
In the following record descriptions, the type of each variable is indicated
by the initial letter of its name:
- C for a character variable
- I for an integer variable
- D for a real (floating point) variable
| Record 1 (file header) |
| Data: |
#IREV, IBINS, IMOLS, CCOUNT |
| Format: |
#I2, I3, I6, 1X, A1, F7.3, 1X, A1 |
| Description: |
| |
IREV |
I2 |
File revision level (currently revision 3) |
| |
IBINS |
I3 |
Number of distance bins |
| |
IMOLS |
I6 |
Number of molecules in file |
| |
CCOUNT |
A1 |
"Y"if data includes pharmacophore counts, otherwise
"N" |
| |
D3DTOL |
F7.3 |
3D tolerance. The pharmacophore tolerance ±x is
calculated from 2x = D3DTOL |
| |
CNAME |
A1 |
"Y" if pharmacophore records include molecule names,
otherwise "N" |
| Record 2 (distance bins) |
| Data: |
#RBIN1, RBIN2, RBIN3, … |
| Format: |
# free format, values separated by spaces |
| Description: |
| |
RBINn |
|
Upper limit of distance bin n |
| Record 3 (molecule header) |
| Data: |
#ICENS, IPHARM, ICOUNT, IREC, CMOLE |
| Format: |
#I3, I11, F11.2, 1X, A |
| Description: |
| |
ICENS |
I3 |
Number of centres in pharmacophores |
| |
IPHARM |
I11 |
Number of unique pharmacophores in molecule |
| |
DCOUNT |
F11.2 |
Total number of pharmacophores for molecule if conformer-based
counting is enabled, otherwise 0 |
| |
CMOLE |
A |
Molecule name |
- Record 3 occurs at the beginning of each molecule block in the file
| Record 4 (pharmacophore data) |
| Data: |
If CNAME=0: CPHARM, RCOUNT
If CNAME=1: CMOLE, CPHARM, RCOUNT |
| Format: |
free format, values separated by commas |
| Description: |
| |
CMOLE |
|
Molecule name |
| |
CPHARM |
|
| Data for pharmacophore n, in the format: |
| |
xxxxdddddd |
4-centres |
| |
xxxddd |
3-centres |
| |
xxd |
2-centres |
| where: |
| |
x is a centre, represented by a 1-letter code |
| |
d is a distance bin, represented by a single
digit or letter |
|
| |
RCOUNT |
|
Pharmacophore count |
- The following 1-letter codes are used for centres:
| D |
H-bond donor |
R |
aromatic ring centroid |
| A |
H-bond acceptor |
L |
lipophile |
| P |
positive charge |
W |
user-defined centre type 1 |
| N |
negative charge |
X |
Lewis base |
| C |
acid |
M |
metal |
| B |
base |
Z |
user-defined centre type 4 |
- The distance bins are encoded using the digits 0-9 and letters a-u, with "0"
being used to represent the first bin, "a" for the 11th bin
and "u" for the 31st bin
- For a pharmacophore listed as x1x2x3x4d1d2d3d4d5d6, the distances relate
to the centres as follows:
| d1 |
distance between centres x1 and x2 |
| d2 |
distance between centres x1 and x3 |
| d3 |
distance between centres x1 and x4 |
| d4 |
distance between centres x2 and x3 |
| d5 |
distance between centres x2 and x4 |
| d6 |
distance between centres x3 and x4 |
- Record 4 occurs once for each pharmacophore
As a result of the loss of precision which occurs when converting a floating
point number into a character string, pharmacophores whose count is less than
0.01 are not written to the file. As a consequence, the sum of the counts in the
pharmacophore records may not equal the value of DCOUNT in the molecule header
record.
Appendix C