Algorithm
The Algorithm function allows you to design and train a
model to predict solubilty values.
Creating
To create a new algorithm select the Algorithm -> new
function from the menu at the top. Enter a unique name
for the new algorithm, select the type of algorithm, and
select a file of molecules with which to train the algorithm.
If experimental solubility values are embeded in the training
file these will be read if recognised. Otherwise you
will be asked to provide the values in a file later on when
performing the training. A new algorithm tab will be
displayed.
Saving/Deteting
If you have created an algorithm or have made any modifications
to an existing one, such training or changing parameters, you
might want to save it by clicking on the 'SAVE' button in the
top left corner of the algorithm tab. This will store all of
the algorithm's properties which can be loaded later. An
algorithm cannot be used to estimate solubilities until it
has been saved.
Clicking on the 'DELETE' button will delete the displayed
algorithm.
Training
An algorithm uses MLR or PLSR to form a model using the data
from the training set. To begin this calculation click on
the 'START TRAIN' button. To cancel the training calculation
while it is proceeding, click on the 'STOP TRAIN' button.
When the training has completed
successfully or unsuccsessfully a message will be displayed
and the 'STOP TRAIN' button will revert back to 'START TRAIN'.
Once training has completed, the parameter coefficients (MLR) or
score percentage (PLSR) will be displayed for each parameter.
Additionaly the occurence of each parameter in the whole training
set will be displayed for each parameter.
If the training was unsuccessfull, the parameter coefficients
will be 0.0. When using MLR the matrix X'X will be singular if
there are too many descriptors and not enough training molecules.
If the occurence of any parameter in the training set is 0.0
the X'X matrix will definately be singular. Once doing an initial
run with a certain set of parameters, if the matrix is singular
look at the occurences of each parameter. If any of the occurences
are 0.0 remove the paramter from the model. Then perform the
training process again.
Atomic Typing and Group Contribution Parameters
In the SMARTS Patterns section of the algorithm tab there is a function
that allows you to browse and select a file to load SMARTS patterns from.
This adds SMARTS patterns as parameters in the Algorithm, after clicking
the 'ADD' button you will see them appear in the 'Current Parameters'
table.
File Format
Each line in the file should start with a number, indicating the parameter
number of the SMARTS pattern. Several SMARTS patterns can have the same
parameter number. This indicates that a match for any one of the parameters
with the same parameter number will be counted as a match for that parameter.
After the number there should be one or more spaces and then a single SMARTS
pattern until the end of the line. For example:
1 [CX4;H4]
1 [CX4;H3]
2 [CX4;H3][#6]
The SMARTS pattern cannot contain any spaces at all. Each line must be in order
and no number can be missing.
Atom Typing vs. Group Contribution
Under the file input line there are two radio buttons allowing you to choose
whether you wish to add the SMARTS patterns as Atomic Typing Parameters or
Group Contribution Parameters.
If they are added as Atomic Typing Parameters, an atom in a molecule can
match at most one of the parameters. If an atom matches more than one of the
parameters, the one closest to the end of the list in the provided file
is chosen.
One atom in a molecule can match zero, one, or many Group Typing Parameters.
Molecular Descriptors
Molecular Descriptors are parameters that take a molecule as input and produce
a floating point value. To add a descriptor to the algorithm, just select
one or more of the descriptors in the 'Molecular Descriptors' table and
click on the 'Add Descriptors' button. You will see the descriptors being added
to the 'Current Parameters' table.
You can implement your own descriptors by writing a Java class. See the
developers manual
Current Parameters
In this section all the algorithm's parameters are displayed showing their
number, type, regression coefficent (MRL) or score (PLSR), and frequency in
the training set. Select one or more parameters and click on
'Remove Selected Parameters' to remove parameters from the algorithm.
Select one parameters and click on 'Plot Selected Parameter' to plot
(for each molecule in the training set) the occurence of the parameter
for the molecule against the error in the logS calculation (error =
|calculated logS - experimental logS|).