Fitness#

moldrug.fitness.Cost(Individual: Individual, wd: str = '.vina_jobs', vina_executable: str = 'vina', vina_seed: int | None = None, receptor_pdbqt_path: str | None = None, boxcenter: List[float] | None = None, boxsize: List[float] | None = None, exhaustiveness: int = 8, ad4map: str | None = None, ncores: int = 1, num_modes: int = 1, constraint: bool = False, constraint_type: str = 'score_only', constraint_ref: Mol | None = None, constraint_receptor_pdb_path: str | None = None, constraint_num_conf: int = 100, constraint_minimum_conf_rms: int = 0.01, desirability: Dict | None = None)[source]#

This is the main Cost function of the module. It use the concept of desirability functions. The response variables are:

  1. Vina score.

  2. Quantitative Estimation of Drug-likeness (QED).

  3. Synthetic accessibility score.

If ad4map is set, the last version of vina (releases) must be installed. To see how to use AutoDock4 force fields in the new version of vina, follow this tutorial <https://autodock-vina.readthedocs.io/en/latest/docking_zinc.html>_

Parameters:
  • Individual (utils.Individual) – A Individual with the pdbqt attribute

  • wd (str, optional) – The working directory to execute the docking jobs, by default ‘.vina_jobs’

  • vina_executable (str, optional) – This is the name of the vina executable, could be a path to the binary object (absolute path is recommended), which must have execution permits (chmod a+x <your binary file>), by default ‘vina’

  • vina_seed (Union[int, None], optional) – Explicit random seed used by vina, by default None

  • receptor_pdbqt_path (str, optional) – Where the receptor pdbqt file is located, by default None

  • boxcenter (list[float], optional) – A list of three floats with the definition of the center of the box in angstrom for docking (x, y, z), by default None

  • boxsize (list[float], optional) – A list of three floats with the definition of the box size in angstrom of the docking box (x, y, z), by default None

  • exhaustiveness (int, optional) – Parameter of vina that controls the accuracy of the docking searching, by default 8

  • ad4map (str, optional) – Affinity maps for the autodock4.2 (ad4) or vina scoring function, by default None

  • ncores (int, optional) – Number of cpus to use in Vina, by default 1

  • num_modes (int, optional) – How many modes should Vina export, by default 1

  • constraint (bool, optional) – Controls if constraint docking will be perform, by default False

  • constraint_type (str, optional) – This is the type of constraint docking. Could be local_only (vina will perform local optimization and score the resulted pose) or score_only (in this case the provided pose by the internal conformer generator will only be scored), by default ‘score_only’

  • constraint_ref (Chem.rdchem.Mol, optional) – The part of the molecule that we would like to constraint, by default None

  • constraint_receptor_pdb_path (str, optional) – The same as constraint_receptor_pdbqt_path but in pdb format, by default None

  • constraint_num_conf (int, optional) – Maximum number of conformer to be generated internally by moldrug , by default 100

  • constraint_minimum_conf_rms (int, optional) – RMS to filter duplicate conformers, by default 0.01

  • desirability (dict, optional) –

    Desirability definition to update the internal default values. The update use moldrug.utils.deep_update() Each variable only will accept the keys [w, and the name of the desirability function of moldrug.utils.DerringerSuichDesirability()], by default None which means that it will use:

    In [1]: from moldrug.fitness import __get_default_desirability
    
    In [2]: import json
    
    In [3]: print(json.dumps(__get_default_desirability(multireceptor=False), indent = 4))
    {
        "qed": {
            "w": 1,
            "LargerTheBest": {
                "LowerLimit": 0.1,
                "Target": 0.75,
                "r": 1
            }
        },
        "sa_score": {
            "w": 1,
            "SmallerTheBest": {
                "Target": 3,
                "UpperLimit": 7,
                "r": 1
            }
        },
        "vina_score": {
            "w": 1,
            "SmallerTheBest": {
                "Target": -12,
                "UpperLimit": -6,
                "r": 1
            }
        }
    }
    

Returns:

A new instance of the original Individual with the the new attributes: pdbqt, qed, vina_score, sa_score and cost. cost attribute will be a number between 0 and 1, been 0 the optimal value.

Return type:

utils.Individual

Example

In [4]: from moldrug import utils, fitness

In [5]: from rdkit import Chem

In [6]: import tempfile, os

In [7]: from moldrug.data import get_data

In [8]: tmp_path = tempfile.TemporaryDirectory()

In [9]: data_x0161 = get_data('x0161')

In [10]: ligand_mol = Chem.MolFromSmiles(data_x0161['smiles'])

In [11]: I = utils.Individual(ligand_mol)

In [12]: box = data_x0161['box']

# Using the default desirability
In [13]: NewI = fitness.Cost(Individual=I, wd=tmp_path.name, receptor_pdbqt_path=data_x0161['protein']['pdbqt'],             boxcenter=box['boxcenter'], boxsize=box['boxsize'], exhaustiveness=4, ncores=4)

In [14]: print(NewI.cost, NewI.vina_score, NewI.qed, NewI.sa_score)
1.0 -5.153 0.7097425790327138 1.5359575087690605
moldrug.fitness.CostMultiReceptors(Individual: Individual, wd: str = '.vina_jobs', vina_executable: str = 'vina', vina_seed: int | None = None, receptor_pdbqt_path: List[str] | None = None, vina_score_type: None | str | List[str] = None, boxcenter: List[List[float]] | None = None, boxsize: List[List[float]] | None = None, exhaustiveness: int = 8, ad4map: List[str] | None = None, ncores: int = 1, num_modes: int = 1, constraint: bool = False, constraint_type: str = 'score_only', constraint_ref: Mol | None = None, constraint_receptor_pdb_path: List[str] | None = None, constraint_num_conf: int = 100, constraint_minimum_conf_rms: int = 0.01, desirability: Dict | None = None)[source]#

This function is similar to moldrug.fitness.Cost() but it will add the possibility to work with more than one receptor. It also use the concept of desirability and the response variables are:

  1. Vina scores.

  2. Quantitative Estimation of Drug-likeness (QED).

  3. Synthetic accessibility score.

In this case every vina score (for all the provided receptors) will be used for the construction of the desirability.

If ad4map is set, the last version of vina (releases) must be installed. To see how to use AutoDock4 force fields in the new version of vina, follow this tutorial <https://autodock-vina.readthedocs.io/en/latest/docking_zinc.html>_

Parameters:
  • Individual (utils.Individual) – A Individual with the pdbqt attribute

  • wd (str, optional) – The working directory to execute the docking jobs, by default ‘.vina_jobs’

  • vina_executable (str, optional) – This is the name of the vina executable, could be a path to the binary object (absolute path is recommended), which must have execution permits (chmod a+x <your binary file>), by default ‘vina’

  • vina_seed (Union[int, None], optional) – Explicit random seed used by vina, by default None

  • receptor_pdbqt_path (list[str], optional) – A list of location of the receptors pdbqt files, by default None

  • vina_score_type (Union[None, str, List[str]], optional) – This is a list with the keywords ‘min’ and/or ‘max’ or ‘ensemble. E.g. If two receptor were provided and for the first one we would like to find a minimum in the vina scoring function and for the other one a maximum (selectivity for the first receptor); we must provided the list: [‘min’, ‘max’]. In the other hand, if we have several conformations of the same receptor (flexible receptor) we could use ‘ensemble’. In this case the vina value used for the optimization will be the lowest (if SmallerTheBest is selected) or highest (LargerTheBest) of the vina scores from all conformations, by default None

  • boxcenter (list[list[float]], optional) – A list of three floats with the definition of the center of the box in angstrom for docking (x, y, z), by default None

  • boxsize (list[list[float]], optional) – A list of three floats with the definition of the box size in angstrom of the docking box (x, y, z), by default None

  • exhaustiveness (int, optional) – Parameter of vina that controls the accuracy of the docking searching, by default 8

  • ad4map (list[str], optional) – A list of affinity maps for the autodock4.2 (ad4) or vina scoring function. For every receptor you should have a separate directory with all the maps, by default None

  • ncores (int, optional) – Number of cpus to use in Vina, by default 1

  • num_modes (int, optional) – How many modes should Vina export, by default 1

  • constraint (bool, optional) – Controls if constraint docking will be perform, by default False

  • constraint_type (str, optional) – This is the type of constraint docking. Could be local_only (vina will perform local optimization and score the resulted pose) or score_only (in this case the provided pose by the internal conformer generator will only be scored), by default ‘score_only’

  • constraint_ref (Chem.rdchem.Mol, optional) – The part of the molecule that we would like to constraint, by default None

  • constraint_receptor_pdb_path (list[str], optional) – The same as constraint_receptor_pdbqt_path but in pdb format, by default None

  • constraint_num_conf (int, optional) – Maximum number of conformer to be generated internally by moldrug , by default 100

  • constraint_minimum_conf_rms (int, optional) – RMS to filter duplicate conformers, by default 0.01

  • desirability (dict, optional) –

    Desirability definition to update the internal default values. The update use moldrug.utils.deep_update() Each variable only will accept the keys [w, and the name of the desirability function of moldrug.utils.DerringerSuichDesirability()]. In the case of vina_scores there is another layer for the vina_score_type = [min, max], by default is None which means that it will use:

    In [1]: from moldrug.fitness import __get_default_desirability
    
    In [2]: import json
    
    In [3]: print(json.dumps(__get_default_desirability(multireceptor=True), indent = 4))
    {
        "qed": {
            "w": 1,
            "LargerTheBest": {
                "LowerLimit": 0.1,
                "Target": 0.75,
                "r": 1
            }
        },
        "sa_score": {
            "w": 1,
            "SmallerTheBest": {
                "Target": 3,
                "UpperLimit": 7,
                "r": 1
            }
        },
        "vina_scores": {
            "ensemble": {
                "w": 1,
                "SmallerTheBest": {
                    "Target": -12,
                    "UpperLimit": -6,
                    "r": 1
                }
            },
            "min": {
                "w": 1,
                "SmallerTheBest": {
                    "Target": -12,
                    "UpperLimit": -6,
                    "r": 1
                }
            },
            "max": {
                "w": 1,
                "LargerTheBest": {
                    "LowerLimit": -4,
                    "Target": 0,
                    "r": 1
                }
            }
        }
    }
    

Returns:

A new instance of the original Individual with the the new attributes: pdbqts [a list of pdbqt], qed, vina_scores[a list of vina_score], sa_score and cost. cost attribute will be a number between 0 and 1, been 0 the optimal value.

Return type:

utils.Individual

Example

In [4]: from moldrug import utils, fitness

In [5]: from rdkit import Chem

In [6]: import tempfile, os

In [7]: from moldrug.data import get_data

In [8]: data_x0161 = get_data('x0161')

In [9]: data_6lu7 = get_data('6lu7')

In [10]: tmp_path = tempfile.TemporaryDirectory()

In [11]: ligand_mol = Chem.MolFromSmiles(data_x0161['smiles'])

In [12]: I = utils.Individual(ligand_mol)

In [13]: receptor_paths = [data_x0161['protein']['pdbqt'], data_6lu7['protein']['pdbqt']]

In [14]: boxcenter = [data_x0161['box']['boxcenter'], data_6lu7['box']['boxcenter']]

In [15]: boxsize = [data_x0161['box']['boxsize'], data_6lu7['box']['boxsize']]

In [16]: vina_score_type = ['min', 'max']

# Using the default desirability
In [17]: NewI = fitness.CostMultiReceptors(
   ....:     Individual = I,
   ....:     wd = tmp_path.name,receptor_pdbqt_path = receptor_paths,
   ....:     vina_score_type = vina_score_type, boxcenter = boxcenter,boxsize = boxsize,exhaustiveness = 4,ncores = 4)
   ....: 

In [18]: print(NewI.cost, NewI.vina_score, NewI.qed, NewI.sa_score)
1.0 [-5.093, -5.296] 0.7097425790327138 1.5359575087690605
moldrug.fitness.CostMultiReceptorsOnlyVina(Individual: Individual, wd: str = '.vina_jobs', vina_executable: str = 'vina', vina_seed: int | None = None, receptor_pdbqt_path: List[str] | None = None, vina_score_type: None | str | List[str] = None, boxcenter: List[List[float]] | None = None, boxsize: List[List[float]] | None = None, exhaustiveness: int = 8, ad4map: List[str] | None = None, ncores: int = 1, num_modes: int = 1, constraint: bool = False, constraint_type: str = 'score_only', constraint_ref: Mol | None = None, constraint_receptor_pdb_path: List[str] | None = None, constraint_num_conf: int = 100, constraint_minimum_conf_rms: int = 0.01, desirability: Dict | None = None, wt_cutoff: None | float = None)[source]#

This function is similar to moldrug.fitness. CostOnlyVina() but it will add the possibility to work with more than one receptor. It also use the concept of desirability. The response variables are the vina scores on each receptor.

If ad4map is set, the last version of vina (releases) must be installed. To see how to use AutoDock4 force fields in the new version of vina, follow this tutorial <https://autodock-vina.readthedocs.io/en/latest/docking_zinc.html>_

Parameters:
  • Individual (utils.Individual) – A Individual with the pdbqt attribute

  • wd (str, optional) – The working directory to execute the docking jobs, by default ‘.vina_jobs’

  • vina_executable (str, optional) – This is the name of the vina executable, could be a path to the binary object (absolute path is recommended), which must have execution permits (chmod a+x <your binary file>), by default ‘vina’

  • vina_seed (Union[int, None], optional) – Explicit random seed used by vina, by default None

  • receptor_pdbqt_path (list[str], optional) – A list of location of the receptors pdbqt files, by default None

  • vina_score_type (Union[None, str, List[str]], optional) – This is a list with the keywords ‘min’ and/or ‘max’ or ‘ensemble. E.g. If two receptor were provided and for the first one we would like to find a minimum in the vina scoring function and for the other one a maximum (selectivity for the first receptor); we must provided the list: [‘min’, ‘max’]. In the other hand, if we have several conformations of the same receptor (flexible receptor) we could use ‘ensemble’. In this case the vina value used for the optimization will be the lowest (if SmallerTheBest is selected) or highest (LargerTheBest) of the vina scores from all conformations, by default None

  • boxcenter (list[float], optional) – A list of three floats with the definition of the center of the box in angstrom for docking (x, y, z), by default None

  • boxsize (list[float], optional) – A list of three floats with the definition of the box size in angstrom of the docking box (x, y, z), by default None

  • exhaustiveness (int, optional) – Parameter of vina that controls the accuracy of the docking searching, by default 8

  • ad4map (list[str], optional) – A list of affinity maps for the autodock4.2 (ad4) or vina scoring function. For every receptor you should have a separate directory with all the maps, by default None

  • ncores (int, optional) – Number of cpus to use in Vina, by default 1

  • num_modes (int, optional) – How many modes should Vina export, by default 1

  • constraint (bool, optional) – Controls if constraint docking will be perform, by default False

  • constraint_type (str, optional) – This is the type of constraint docking. Could be local_only (vina will perform local optimization and score the resulted pose) or score_only (in this case the provided pose by the internal conformer generator will only be scored), by default ‘score_only’

  • constraint_ref (Chem.rdchem.Mol, optional) – The part of the molecule that we would like to constraint, by default None

  • constraint_receptor_pdb_path (list[str], optional) – The same as constraint_receptor_pdbqt_path but in pdb format, by default None

  • constraint_num_conf (int, optional) – Maximum number of conformer to be generated internally by moldrug , by default 100

  • constraint_minimum_conf_rms (int, optional) – RMS to filter duplicate conformers, by default 0.01

  • desirability (dict, optional) –

    Desirability definition to update the internal default values. The update use moldrug.utils.deep_update() Each variable only will accept the keys [w, and the name of the desirability function of moldrug.utils.DerringerSuichDesirability()]. In the case of vina_scores there is another layer for the vina_score_type = [min, max], by default is None which means that it will use:

    In [1]: from moldrug.fitness import __get_default_desirability
    
    In [2]: import json
    
    In [3]: print(json.dumps(__get_default_desirability(multireceptor=True)['vina_scores'], indent = 4))
    {
        "ensemble": {
            "w": 1,
            "SmallerTheBest": {
                "Target": -12,
                "UpperLimit": -6,
                "r": 1
            }
        },
        "min": {
            "w": 1,
            "SmallerTheBest": {
                "Target": -12,
                "UpperLimit": -6,
                "r": 1
            }
        },
        "max": {
            "w": 1,
            "LargerTheBest": {
                "LowerLimit": -4,
                "Target": 0,
                "r": 1
            }
        }
    }
    

    If vina_score_type = ensemble, the only parameter that it will be used is the name of the desirability in order to look for a minimum (SmallerTheBest) or a maximum (LargerTheBest). The following desaribility is enough in case minimum is desired:

    In [4]: desirability = {
       ...:     'ensemble': 'SmallerTheBest'  # (LargerTheBest if a maximum is desired)
       ...:     }
       ...: 
    
    In [5]: print(json.dumps(desirability, indent = 4))
    {
        "ensemble": "SmallerTheBest"
    }
    

wt_cutoffUnion[None, float], optional

If some number is provided, the molecules with a molecular weight higher than wt_cutoff will get as vina_score = cost = np.inf. Vina will not be invoked, by default None

Returns:

A new instance of the original Individual with the the new attributes: pdbqts [a list of pdbqt], vina_scores [a list of vina_score], and cost. cost attribute will be a number between 0 and 1, been 0 the optimal value.

Return type:

utils.Individual

Example

In [6]: from moldrug import utils, fitness

In [7]: from rdkit import Chem

In [8]: import tempfile, os

In [9]: from moldrug.data import get_data

In [10]: data_x0161 = get_data('x0161')

In [11]: data_6lu7 = get_data('6lu7')

In [12]: tmp_path = tempfile.TemporaryDirectory()

In [13]: ligand_mol = Chem.MolFromSmiles(data_x0161['smiles'])

In [14]: I = utils.Individual(ligand_mol)

In [15]: receptor_paths = [data_x0161['protein']['pdbqt'], data_6lu7['protein']['pdbqt']]

In [16]: boxcenter = [data_x0161['box']['boxcenter'], data_6lu7['box']['boxcenter']]

In [17]: boxsize = [data_x0161['box']['boxsize'], data_6lu7['box']['boxsize']]

In [18]: vina_score_type = ['min', 'max']

# Using the default desirability
In [19]: NewI = fitness.CostMultiReceptorsOnlyVina(
   ....:     Individual = I,wd = tmp_path.name,receptor_pdbqt_path = receptor_paths,
   ....:     vina_score_type = vina_score_type, boxcenter = boxcenter,boxsize = boxsize,
   ....:     exhaustiveness = 4,ncores = 4)
   ....: 

In [20]: print(NewI.cost, NewI.vina_score)
1.0 [-5.095, -5.267]
moldrug.fitness.CostOnlyVina(Individual: Individual, wd: str = '.vina_jobs', vina_executable: str = 'vina', vina_seed: int | None = None, receptor_pdbqt_path: str | None = None, boxcenter: List[float] | None = None, boxsize: List[float] | None = None, exhaustiveness: int = 8, ad4map: str | None = None, ncores: int = 1, num_modes: int = 1, constraint: bool = False, constraint_type: str = 'score_only', constraint_ref: Mol | None = None, constraint_receptor_pdb_path: str | None = None, constraint_num_conf: int = 100, constraint_minimum_conf_rms: int = 0.01, wt_cutoff: None | float = None)[source]#

This Cost function performs Docking and return the vina_score as Cost.

Parameters:
  • Individual (utils.Individual) – A Individual with the pdbqt attribute

  • wd (str, optional) – The working directory to execute the docking jobs, by default ‘.vina_jobs’

  • vina_executable (str, optional) – This is the name of the vina executable, could be a path to the binary object (absolute path is recommended), which must have execution permits (chmod a+x <your binary file>), by default ‘vina’

  • vina_seed (Union[int, None], optional) – Explicit random seed used by vina, by default None

  • receptor_path (str, optional) – Where the receptor pdbqt file is located, by default None

  • boxcenter (list[float], optional) – A list of three floats with the definition of the center of the box in angstrom for docking (x, y, z), by default None

  • boxsize (list[float], optional) – A list of three floats with the definition of the box size in angstrom of the docking box (x, y, z), by default None

  • exhaustiveness (int, optional) – Parameter of vina that controls the accuracy of the docking searching, by default 8

  • ad4map (str, optional) – Affinity maps for the autodock4.2 (ad4) or vina scoring function, by default None

  • ncores (int, optional) – Number of cpus to use in Vina, by default 1

  • num_modes (int, optional) – How many modes should Vina export, by default 1

  • constraint (bool, optional) – Controls if constraint docking will be perform, by default False

  • constraint_type (str, optional) – This is the type of constraint docking. Could be local_only (vina will perform local optimization and score the resulted pose) or score_only (in this case the provided pose by the internal conformer generator will only be scored), by default ‘score_only’

  • constraint_ref (Chem.rdchem.Mol, optional) – The part of the molecule that we would like to constraint, by default None

  • constraint_receptor_pdb_path (str, optional) – The same as constraint_receptor_pdbqt_path but in pdb format, by default None

  • constraint_num_conf (int, optional) – Maximum number of conformer to be generated internally by moldrug , by default 100

  • constraint_minimum_conf_rms (int, optional) – RMS to filter duplicate conformers, by default 0.01

  • wt_cutoff (Union[None, float], optional) – If some number is provided the molecules with a molecular weight higher than wt_cutoff will get as vina_score = cost = np.inf. Vina will not be invoked, by default None

Returns:

A new instance of the original Individual with the the new attributes: pdbqt, vina_score and cost. In this case cost = vina_score, the lowest the values the best individual.

Return type:

utils.Individual

Example

In [1]: from moldrug import utils, fitness

In [2]: from rdkit import Chem

In [3]: import tempfile, os

In [4]: from moldrug.data import get_data

In [5]: tmp_path = tempfile.TemporaryDirectory()

In [6]: data_x0161 = get_data('x0161')

In [7]: ligand_mol = Chem.MolFromSmiles(data_x0161['smiles'])

In [8]: I = utils.Individual(ligand_mol)

In [9]: box = data_x0161['box']

In [10]: NewI = fitness.CostOnlyVina(Individual=I, wd=tmp_path.name, receptor_pdbqt_path=data_x0161['protein']['pdbqt'],            boxcenter=box['boxcenter'], boxsize=box['boxsize'], exhaustiveness=4,ncores=4)

In [11]: print(NewI.cost, NewI.vina_score)
-5.128 -5.128