Summary#
Input data#
Currently, moldrug only accepts valid RDKit molecules and a valid pdbqt file that will be processed in case AutoDock-Vina is used.
The idea#
The idea is simple: we want the best drug for some applications. The problem is that it is not as simple as it sounds. The chemical space is enormous and the estimation of the fitness is also a term of concern.
Here we address this problem using a Genetic Algorithm. The inputs are
Biological target (pdbqt format).
RDKit molecule of some known (or guessing) drug. (it could get easily get from the SMILES string)
Definition of the binding pocket.
Definition of the
moldrug.utils.GA()running variables.
With the molecule, a random population of popsize
individuals will be generated through the CReM
operation mutate_mol. Every individual will be evaluated through the selected fitness function.
The best individual will be used for crossover and mutation, generating pc*popsize``offspring.
The offspring will be merged with the current population, sorted with respect to the fitness function
and the first ``popsize individuals will be kept for the next generation.
This cycle will run for maxiter generations.
In a general way moldrug will try to minimize:
Where \(\vec{X}\) is some description of the chemical structure of the molecule which is mapped to the \(\mathbb{R}\) numbers through the function \(f\). This function will be called cost function.
Alternatively, there is also the class moldrug.utils.Local().
In this case, only an initial population is created and exported.
Fitness functions#
The default fitness functions could be accessed through the module moldrug.fitness.
There are currently four implemented:
moldrug.fitness.Cost()(standard). It uses all the response variables.moldrug.fitness.CostOnlyVina(). Only use the information of the Vina score.moldrug.fitness.CostMultiReceptors(). It uses all the response variables.moldrug.fitness.CostMultiReceptorsOnlyVina(). Only use the information of the Vina scores
from moldrug import fitness
cost = fitness.Cost
A molecule must present several properties to be considered a drug. Some of the most important are:
be potent, reach the biological target (good ADME profile) and be real. The last obvious property could
be a bottleneck for computer-assisted drug design. Because we want to optimize several response variables
at the same time. These cost functions (with the exception of moldrug.fitness.CostOnlyVina()) use the concept of desirability functions
which optimizes several response variables on the hub.
The following are the response variables:
For each of this response variable we create their corresponding Derringer-Suich desirability functions (see here some examples of how it is implemented in moldrug). And then we combine them as a geometric mean:
where \(w_i\) are the weights of each variable; and \(d_i\) the desirability functions. Each individual \(d_i\) ranges from 0 to 1 and therefore also \(D\). Because we are looking for the minimum, the function cost return \(1 - D\).
Multi Receptor#
Could be that our receptor presents high flexibility or that we are interested in generating specific
small molecules. In this case could be convenient to add more than one receptor to the cost function.
In moldrug.fitness module the cost functions
moldrug.fitness.CostMultiReceptors() and moldrug.fitness.CostMultiReceptorsOnlyVina()
try to reach this goal. For the case of flexibility, we could perform docking in an ensemble
of protein structures and just keep the lower scoring rather than include all of them in the final desirability function.