Algorithm

The Algorithm function allows you to design and train a model to predict solubilty values.

Creating

To create a new algorithm select the Algorithm -> new function from the menu at the top. Enter a unique name for the new algorithm, select the type of algorithm, and select a file of molecules with which to train the algorithm. If experimental solubility values are embeded in the training file these will be read if recognised. Otherwise you will be asked to provide the values in a file later on when performing the training. A new algorithm tab will be displayed.

Saving/Deteting

If you have created an algorithm or have made any modifications to an existing one, such training or changing parameters, you might want to save it by clicking on the 'SAVE' button in the top left corner of the algorithm tab. This will store all of the algorithm's properties which can be loaded later. An algorithm cannot be used to estimate solubilities until it has been saved.

Clicking on the 'DELETE' button will delete the displayed algorithm.

Training

An algorithm uses MLR or PLSR to form a model using the data from the training set. To begin this calculation click on the 'START TRAIN' button. To cancel the training calculation while it is proceeding, click on the 'STOP TRAIN' button. When the training has completed successfully or unsuccsessfully a message will be displayed and the 'STOP TRAIN' button will revert back to 'START TRAIN'.

Once training has completed, the parameter coefficients (MLR) or score percentage (PLSR) will be displayed for each parameter. Additionaly the occurence of each parameter in the whole training set will be displayed for each parameter.

If the training was unsuccessfull, the parameter coefficients will be 0.0. When using MLR the matrix X'X will be singular if there are too many descriptors and not enough training molecules. If the occurence of any parameter in the training set is 0.0 the X'X matrix will definately be singular. Once doing an initial run with a certain set of parameters, if the matrix is singular look at the occurences of each parameter. If any of the occurences are 0.0 remove the paramter from the model. Then perform the training process again.

Atomic Typing and Group Contribution Parameters

In the SMARTS Patterns section of the algorithm tab there is a function that allows you to browse and select a file to load SMARTS patterns from. This adds SMARTS patterns as parameters in the Algorithm, after clicking the 'ADD' button you will see them appear in the 'Current Parameters' table.

File Format

Each line in the file should start with a number, indicating the parameter number of the SMARTS pattern. Several SMARTS patterns can have the same parameter number. This indicates that a match for any one of the parameters with the same parameter number will be counted as a match for that parameter.
After the number there should be one or more spaces and then a single SMARTS pattern until the end of the line. For example:

1 [CX4;H4]
1 [CX4;H3]
2 [CX4;H3][#6]

The SMARTS pattern cannot contain any spaces at all. Each line must be in order and no number can be missing.

Atom Typing vs. Group Contribution

Under the file input line there are two radio buttons allowing you to choose whether you wish to add the SMARTS patterns as Atomic Typing Parameters or Group Contribution Parameters. If they are added as Atomic Typing Parameters, an atom in a molecule can match at most one of the parameters. If an atom matches more than one of the parameters, the one closest to the end of the list in the provided file is chosen. One atom in a molecule can match zero, one, or many Group Typing Parameters.

Molecular Descriptors

Molecular Descriptors are parameters that take a molecule as input and produce a floating point value. To add a descriptor to the algorithm, just select one or more of the descriptors in the 'Molecular Descriptors' table and click on the 'Add Descriptors' button. You will see the descriptors being added to the 'Current Parameters' table.

You can implement your own descriptors by writing a Java class. See the developers manual

Current Parameters

In this section all the algorithm's parameters are displayed showing their number, type, regression coefficent (MRL) or score (PLSR), and frequency in the training set. Select one or more parameters and click on 'Remove Selected Parameters' to remove parameters from the algorithm. Select one parameters and click on 'Plot Selected Parameter' to plot (for each molecule in the training set) the occurence of the parameter for the molecule against the error in the logS calculation (error = |calculated logS - experimental logS|).