Utils#
- class moldrug.utils.Atom(line)[source]#
This is a simple class to wrap a pdbqt Atom. It is based on https://userguide.mdanalysis.org/stable/formats/reference/pdbqt.html#writing-out.
- class moldrug.utils.CHUNK_VINA_OUT(chunk)[source]#
This class will be used by VINA_OUT in order to read the pdbqt ouput of a vina docking results.
- moldrug.utils.DerringerSuichDesirability()[source]#
A warper around the implemented desirability functions
- Returns:
A dict with key name of the desirability and value the corresponded function
- Return type:
- class moldrug.utils.GA(seed_mol: Mol | Iterable[Mol], costfunc: Callable, costfunc_kwargs: Dict, crem_db_path: str, maxiter: int = 10, popsize: int = 20, beta: float = 0.001, pc: float = 1, get_similar: bool = False, mutate_crem_kwargs: None | Dict = None, save_pop_every_gen: int = 0, checkpoint: bool = False, deffnm: str = 'ga', AddHs: bool = False, randomseed: int | None = None)[source]#
An implementation of a genetic algorithm to search in the chemical space.
- get_similar#
Bias the search upon similar molecules. If True
modrug.utils.get_similar_mols()is used after the mutation with CReM instead random choice.- Type:
- save_pop_every_gen#
Frequency to save the pickle file o fthe population during the optimazation.
- Type:
- NumGens#
he number of generations performed by the class. Subsequent
__call__executions update this number acordennly.- Type:
- SawIndividuals#
All the Individulas saw during the optimizations.
- Type:
set[
moldrug.utils.Individuals()]
- acceptance#
A dictionary with key the Generation id and as value another dictionary with keys
accepetedandgeneratedwith the number of accepted and genereated individuals on the generation respectively.- Type:
- InitIndividual#
The initial individual based on _seed_mol.
- Type:
moldrug.utils.Individuals()
- pop#
The final population sorted by cost.
- Type:
list[
moldrug.utils.Individuals()]
- TODO#
Timing the simulation, add tracking variable for the timing of the evaluation and genereation of moleucles. Print at the end of each call
- Extend to other genereators:
mutate_crem_kwargs = None and some other keyword that get the generator function, in this case the mutate method will be overwrite
with the user provided, this fucntion will take an Individual and return a new offspring, to be more copatible and not create issues, I good idea will be that this fucntion accept a self as first arguemnt, and internally, it will use the self of the GA class
- __call__(njobs: int = 1)[source]#
Call definition
- Parameters:
njobs (int, optional) – The number of jobs for parallelization, the module multiprocessing will be used, by default 1,
- Raises:
RuntimeError – Error during the initialization of the population.
- __init__(seed_mol: Mol | Iterable[Mol], costfunc: Callable, costfunc_kwargs: Dict, crem_db_path: str, maxiter: int = 10, popsize: int = 20, beta: float = 0.001, pc: float = 1, get_similar: bool = False, mutate_crem_kwargs: None | Dict = None, save_pop_every_gen: int = 0, checkpoint: bool = False, deffnm: str = 'ga', AddHs: bool = False, randomseed: int | None = None) None[source]#
Constructor
- Parameters:
seed_mol (Union[Chem.rdchem.Mol, Iterable[Chem.rdchem.Mol]]) – The seed molecule submitted to genetic algorithm optimization on the chemical space. Could be only one RDKit molecule or more than one specified in an Iterable object.
costfunc (Callable) – The cost function to work with (any from
moldrug.fitnessor a valid user defined).costfunc_kwargs (Dict) – The keyword arguments of the selected cost function
crem_db_path (str) – Path to the CReM data base.
maxiter (int, optional) – Maximum number of iteration (or generation), by default 10.
popsize (int, optional) – Population size, by default 20.
beta (float, optional) – Selection pressure. Higher values means that the best individual are going to be sumitted for mutations more frquently, by default 0.001.
pc (float, optional) – Proportion of children, by default 1
get_similar (bool, optional) – If True the searching will be bias to similar molecules, by default False
mutate_crem_kwargs (Union[None, Dict], optional) – Parameters for mutate_mol of CReM, by default {}
save_pop_every_gen (int, optional) – Frequency to save the population, by default 0
checkpoint (bool, optional) – If True the whole class will be saved as cpt with the frequency of save_pop_every_gen. This means that if save_pop_every_gen = 0 and checkpoint = True, no checkpoint will be output, by default False
deffnm (str, optional) – Default prefix name for all generated files, by default ‘ga’
AddHs (bool, optional) – If True the explicit hydrogens will be added, by default False
randomseed (Union[None, int], optional) – Set a random seed for reproducibility, by default None
- Raises:
TypeError – In case that seed_mol is a wrong input.
ValueError – In case of incorrect definition of mutate_crem_kwargs. It must be None or a dict instance.
ValueError – In case of crem_db_path deos not exist.
- mutate(individual: Individual)[source]#
Genetic operators
- Parameters:
individual (Individual) – The individual to mutate.
- Returns:
A new Individual.
- Return type:
- pickle(title: str, compress: bool = False)[source]#
Method to pickle the whole GA class
- Parameters:
title (str) – Name of the object which will be completed with the corresponding extension depending if compress is set to True or False.
compress (bool, optional) – Use compression, by default False. If True
moldrug.utils.compressed_pickle()will be used; if notmoldrug.utils.full_pickle()will be used instead.
- class moldrug.utils.Individual(mol: Mol, idx: int | str = 0, pdbqt: str | None = None, cost: float = inf, randomseed: int | None = None)[source]#
Base class to work with GA, Local and all the fitness functions. Individual is a mutable object. Only the attribute smiles it is not mutable and is used for hash. Therefore this class is hashable based on the smiles attribute. This one is also used for ‘==’ comparison If two Individuals has the same smiles not matter if the rest of the elements are different, they will be considered the same. The cost attribute is used for arithmetic operations. It also admit copy and deepcopy operations. Known issue, in case that we would like to use a numpy array of individuals. It is needed to change the dtype of the generated arrays
- mol#
The molecule object
- Type:
Chem.rdchem.Mol
- pdbqt#
A pdbqt string representation of the molecule, used for docking with Vina. It is generated during the initialization of the class
- Type:
- smiles#
The SMILES representation of the mol attribute without explicit hydrogens, this attribute (property) is immutable.
- cost#
This attribute is used to interact with the fitness functions of
moldrug.fitness- Type:
Example
In [1]: from moldrug import utils, fitness In [2]: import numpy as np In [3]: from copy import copy, deepcopy In [4]: from rdkit import Chem In [5]: i1 = utils.Individual(mol = Chem.MolFromSmiles('CC'), idx = 1, cost = 5) In [6]: i2 = utils.Individual(mol = Chem.MolFromSmiles('CC'), idx = 2, cost = 4) In [7]: i3 = utils.Individual(mol = Chem.MolFromSmiles('CCC'), idx = 3, cost = 4) # Show the '==' operation In [8]: print(i1 == i2, i1 == i3) True False # Show that Individual is a hashable object based on the smiles In [9]: print(set([i1,i2,i3])) {Individual(idx = 3, smiles = CCC, cost = 4), Individual(idx = 1, smiles = CC, cost = 5)} # Show arithmetic operations In [10]: print(i1+i2) 9 # How to work with numpy In [11]: array = np.array([i1,i2, i3]) In [12]: array_2 = (array*2).astype('float64') In [13]: print(array_2) [10. 8. 8.] # Show copy In [14]: print(copy(i3), deepcopy(i3)) Individual(idx = 3, smiles = CCC, cost = 4) Individual(idx = 3, smiles = CCC, cost = 4)
- __init__(mol: Mol, idx: int | str = 0, pdbqt: str | None = None, cost: float = inf, randomseed: int | None = None) None[source]#
This is the constructor of the class.
- Parameters:
mol (Chem.rdchem.Mol, optional) – A valid RDKit molecule.
idx (Union[int str], optional) – An identification, by default 0
pdbqt (str, optional) – A valid pdbqt string. If it is not provided it will be generated from mol through utils.confgen and the mol attribute will be update with the 3D model, by default None
cost (float, optional) – This attribute is used to perform operations between Individuals and should be used for the cost functions, by default np.inf
randomseed (Union[None, int], optional) – Provide a seed for the random number generator so that the “same” coordinates can be obtained for the attribute pdbqt on multiple runs. If None, the RNG will not be seeded, by default None
- moldrug.utils.LargerTheBest(Value: float, LowerLimit: float, Target: float, r: float = 1) float[source]#
Desirability function used when larger values are the targets. If Value is higher or equal than the target it will return 1; if it is lower than LowerLimit it will return 0; else a number between 0 and 1. You can also check: doi:10.1016/j.chemolab.2011.04.004 https://www.youtube.com/watch?v=quz4NW0uIYw&list=PL6ebkIZFT4xXiVdpOeKR4o_sKLSY0aQf_&index=3
- Parameters:
Value (float) – Value to test.
LowerLimit (float) – Lower value accepted. Lower than this one will return 0.
Target (float) – The target value. On this value (or higher) the function takes 1 as value.
r (float, optional) – This is the exponent of the interpolation. Could be used to control the interpolation, by default 1
- Returns:
A number between 0 and 1. Been 1 the desireable value to get.
- Return type:
- class moldrug.utils.Local(seed_mol: Mol, crem_db_path: str, costfunc: object, grow_crem_kwargs: Dict | None = None, costfunc_kwargs: Dict | None = None, AddHs: bool = False, randomseed: int | None = None, deffnm: str = 'local')[source]#
This class is used to genereate close solutions to the seed molecule. It use
crem.crem.grow_mol().- pop#
The final population sorted by cost.
- Type:
list[
moldrug.utils.Individuals()]
- __init__(seed_mol: Mol, crem_db_path: str, costfunc: object, grow_crem_kwargs: Dict | None = None, costfunc_kwargs: Dict | None = None, AddHs: bool = False, randomseed: int | None = None, deffnm: str = 'local') None[source]#
Creator
- Parameters:
seed_mol (Chem.rdchem.Mol) – The seed molecule from which the population will be generated.
crem_db_path (str) – The pathway to the CReM data base.
costfunc (object) – The cost function to work with (any from
moldrug.fitnessor a valid user defined).grow_crem_kwargs (Dict, optional) – The keywords of the grow_mol function of CReM, by default None
costfunc_kwargs (Dict, optional) – The keyword arguments of the selected cost function, by default None
AddHs (bool, optional) – If True the explicit hyrgones will be added, by default False
randomseed (Union[None, int], optional) – Set a random seed for reproducibility, by default None
deffnm (str) – Just a place holder for compatibility with the CLI.
- Raises:
Exception – In case that some problem occured during the creation of the Individula from the seed_mol
ValueError – In case of incorrect definition of grow_crem_kwargs and/or costfunc_kwargs. They must be None or a dict instance.
- pickle(title: str, compress: bool = False)[source]#
Method to pickle the whole Local class
- Parameters:
title (str) – Name of the object which will be compleated with the correposnding extension depending if compress is set to True or False.
compress (bool, optional) – Use compression, by default False. If True
moldrug.utils.compressed_pickle()will be used; if notmoldrug.utils.full_pickle()will be used instead.
- moldrug.utils.NominalTheBest(Value: float, LowerLimit: float, Target: float, UpperLimit: float, r1: float = 1, r2: float = 1) float[source]#
Desirability function used when a target value is desired. If Value is lower or equal than the LowerLimit it will return 0; as well values higher or equal than UpperLimit; else a number between 0 and 1.
- Parameters:
Value (float) – Value to test.
LowerLimit (float) – Lower value accepted. Lower than this one will return 0.
Target (float) – The target value. On this value the function takes 1 as value.
UpperLimit (float) – Upper value accepted. Higher than this one will return 0.
r1 (float, optional) – This is the exponent of the interpolation from LowerLimit to Target. Could be used to control the interpolation, by default 1
r2 (float, optional) – This is the exponent of the interpolation from Target to UpperLimit. Could be used to control the interpolation, by default 1
- Returns:
A number between 0 and 1. Been 1 the desireable value to get.
- Return type:
- moldrug.utils.SmallerTheBest(Value: float, Target: float, UpperLimit: float, r: float = 1) float[source]#
Desirability function used when lower values are the targets. If Value is lower or equal than the target it will return 1; if it is higher than UpperLimit it will return 0; else a number between 0 and 1.
- Parameters:
Value (float) – Value to test.
Target (float) – The target value. On this value (or lower) the function takes 1 as value.
UpperLimit (float) – Upper value accepted. Higher than this one will return 0.
r (float, optional) – This is the exponent of the interpolation. Could be used to control the interpolation, by default 1
- Returns:
A number between 0 and 1. Been 1 the desireable value to get.
- Return type:
- class moldrug.utils.VINA_OUT(file)[source]#
Vina class to handle vina output. Think about use meeko in the future!
- moldrug.utils.compressed_pickle(title: str, data: object)[source]#
Compress Python object. First cPickle it and then bz2.BZ2File compressed it.
- moldrug.utils.confgen(mol: Mol, return_mol: bool = False, randomseed: int | None = None)[source]#
Create a 3D model from a smiles and return a pdbqt string and, a mol if
return_mol = True.- Parameters:
mol (Chem.rdchem.Mol) – A valid RDKit molecule.
return_mol (bool, optional) – If true the function will also return the
rdkit.Chem.rdchem.Mol, by default Falserandomseed (Union[None, int], optional) – Provide a seed for the random number generator so that the same coordinates can be obtained for a molecule on multiple runs. If None, the RNG will not be seeded, by default None
- Returns:
If
return_mol = Trueit will return a tuple(str[pdbqt], Chem.rdchem.Mol), if not only astrthat represents the pdbqt.- Return type:
- moldrug.utils.decompress_pickle(file: str)[source]#
Decompress CPickle objects compressed first with bz2 formats
- moldrug.utils.deep_update(target_dict: dict, update_dict: dict) dict[source]#
Recursively update a dictionary with the key-value pairs from another dictionary. Inpired on https://stackoverflow.com/questions/3232943/update-value-of-a-nested-dictionary-of-varying-depth
- Parameters:
Example
In [1]: from moldrug.utils import deep_update In [2]: target = {'a': 1, 'b': {'c': 2, 'd': 3}} In [3]: updates = {'b': {'c': 4, 'e': 5}, 'f': 6} In [4]: result = deep_update(target, updates) In [5]: print(result) {'a': 1, 'b': {'c': 4, 'd': 3, 'e': 5}, 'f': 6} # Output: {'a': 1, 'b': {'c': 4, 'd': 3, 'e': 5}, 'f': 6}
- Returns:
The updated dictionary
- Return type:
- moldrug.utils.get_sim(ms: List[Mol], ref_fps: List)[source]#
Get the molecules with higher similarity to each member of ref_fps.
- moldrug.utils.get_similar_mols(mols: List, ref_mol: Mol, pick: int, beta: float = 0.01)[source]#
Pick the similar molecules from mols respect to ref_mol using a roulette wheel selection strategy.
- moldrug.utils.import_sascorer()[source]#
Function to import sascorer from RDConfig.RDContribDir of RDKit
- Returns:
The sascorer module ready to use.
- Return type:
module
- moldrug.utils.is_iter(obj)[source]#
Check if obj is iterable
- Parameters:
obj (Any) – Any python object
- Returns:
Tru if obj iterable, False if not
- Return type:
- moldrug.utils.lipinski_filter(mol: Mol, maxviolation: int = 2)[source]#
Implementation of Lipinski filter.
- moldrug.utils.lipinski_profile(mol: Mol)[source]#
-
- Parameters:
mol (Chem.rdchem.Mol) – An RDKit molecule.
- Returns:
A dictionary with molecular properties.
- Return type:
- moldrug.utils.make_sdf(individuals: List[Individual], sdf_name: str = 'out')[source]#
This function create a sdf file from a list of Individuals based on their pdbqt attribute This assume that the cost function update the pdbqt attribute after the docking with the conformations obtained In the case of multiple receptor the attribute should be a list of valid pdbqt strings. Here will export several sdf depending how many pdbqt string are in the pdbqt attribute.
- Parameters:
individuals (list[Individual]) – A list of individuals
sdf_name (str, optional) – The name for the output file. Could be a
path + sdf_name. The sdf extension will be added by the function, by default ‘out’
Example
In [1]: import tempfile, os In [2]: from moldrug import utils In [3]: from rdkit import Chem # Create some temporal dir In [4]: tmp_path = tempfile.TemporaryDirectory() # Creating two individuals In [5]: I1 = utils.Individual(Chem.MolFromSmiles('CCCCl')) In [6]: I2 = utils.Individual(Chem.MolFromSmiles('CCOCCCF')) # Creating the pdbqt attribute as a list with the pdbqt attribute (this is just a silly example) In [7]: I1.pdbqt = [I1.pdbqt, I1.pdbqt] In [8]: I2.pdbqt = [I2.pdbqt, I2.pdbqt] In [9]: utils.make_sdf([I1, I2], sdf_name = os.path.join(tmp_path.name, 'out')) File /tmp/tmprj8ilrsr/out_1.sdf was created! File /tmp/tmprj8ilrsr/out_2.sdf was created! # Two files were created # In the other hand, if the attribute pdbqt is not a list, only one file is going to be created # Set pdbqt to the original value In [10]: I1.pdbqt = I1.pdbqt[0] In [11]: I2.pdbqt = I2.pdbqt[0] In [12]: utils.make_sdf([I1, I2], sdf_name = os.path.join(tmp_path.name, 'out')) File /tmp/tmprj8ilrsr/out.sdf was createad! # Only one file will be created if the pdbqt has not len in some of # the individuals or they presents different lens as well. # In this case the pdbqts will be completely ignored and pdbqt attribute # will be used for the construction of the sdf file In [13]: I1.pdbqt = [I1.pdbqt, I1.pdbqt, I1.pdbqt] In [14]: I2.pdbqt = [I2.pdbqt, I2.pdbqt] In [15]: utils.make_sdf([I1, I2], sdf_name = os.path.join(tmp_path.name, 'out')) File /tmp/tmprj8ilrsr/out.sdf was createad!
- moldrug.utils.roulette_wheel_selection(p: List[float])[source]#
Function to select the offsprings based on their fitness.
- moldrug.utils.run(command: str, shell: bool = True, executable: str = '/bin/bash')[source]#
This function is just a useful wrapper around subprocess.run
- Parameters:
- Returns:
The processes returned by Run.
- Return type:
- Raises:
RuntimeError – In case of non-zero exit status on the provided command.
- moldrug.utils.tar_errors(error_path: str = 'error')[source]#
Clean errors in the working directory. Convert to error.tar.gz the error_path and delete the directory.
- Parameters:
error_path (str) – Where the errors are storged.
- moldrug.utils.to_dataframe(individuals: List[Individual], return_mol: bool = False) DataFrame[source]#
Convert a list of individuals to a DataFrame
- Parameters:
individuals (List[Individual]) – The list of individuals
return_mol (bool, optional) – If True the attribute mol will bot be return, by default False
- Returns:
The DataFrame
- Return type:
pd.DataFrame
- moldrug.utils.update_reactant_zone(parent: Mol, offspring: Mol, parent_replace_ids: List[int] | None = None, parent_protected_ids: List[int] | None = None)[source]#
This function will find the difference between offspring and parent based on the Maximum Common Substructure (MCS). This difference will be consider offspring_replace_ids. Because after a reaction the indexes of the product could change respect to the reactant, the parent_replace_ids could change. The function will map the index of the parent to the offspring based on MCS. If on those indexes some of the parent_replace_ids are still present, they will be updated based on the offspring and also added to offspring_replace_ids. Similarly will be done for the parent_protected_ids.
- Parameters:
parent (Chem.rdchem.Mol) – The original molecule from where offspring was generated
offspring (Chem.rdchem.Mol) – A derivative of parent
parent_replace_ids (List[int], optional) – A list of replaceable indexes in the parent, by default None
parent_protected_ids (List[int], optional) – A list of protected indexes in the parent, by default None
- Returns:
The function returns a tuple composed by two list of integers. The first list is offspring_replace_ids and the second one offspring_protected_ids.
- Return type: