chemprop.featurizers.molgraph#

Submodules#

Attributes#

Classes#

MolGraphCache

A MolGraphCache precomputes the corresponding

MolGraphCacheFacade

A MolGraphCacheFacade provided an interface for caching

MolGraphCacheOnTheFly

A MolGraphCacheOnTheFly computes the corresponding

CuikmolmakerMolGraphFeaturizer

A CuikmolmakerMolGraphFeaturizer featurizes a list of molecules at once instead of

SimpleMoleculeMolGraphFeaturizer

A SimpleMoleculeMolGraphFeaturizer is the default implementation of a

CondensedGraphOfReactionFeaturizer

A CondensedGraphOfReactionFeaturizer featurizes reactions using the condensed

RxnMode

The mode by which a reaction should be featurized into a MolGraph

Package Contents#

class chemprop.featurizers.molgraph.MolGraphCache(inputs, V_fs, E_fs, featurizer, n_workers=0)[source]#

Bases: MolGraphCacheFacade

A MolGraphCache precomputes the corresponding MolGraphs and caches them in memory.

Parameters:
__len__()[source]#
Return type:

int

__getitem__(index)[source]#
Parameters:

index (int)

Return type:

chemprop.data.molgraph.MolGraph

class chemprop.featurizers.molgraph.MolGraphCacheFacade(inputs, V_fs, E_fs, featurizer)[source]#

Bases: collections.abc.Sequence[chemprop.data.molgraph.MolGraph], Generic[chemprop.featurizers.base.S]

A MolGraphCacheFacade provided an interface for caching MolGraphs.

Note

This class only provides a facade for a cached dataset, but it does not guarantee whether the underlying data is truly cached.

Parameters:
  • inputs (Iterable[S]) – The inputs to be featurized.

  • V_fs (Iterable[np.ndarray]) – The node features for each input.

  • E_fs (Iterable[np.ndarray]) – The edge features for each input.

  • featurizer (Featurizer[S, MolGraph]) – The featurizer with which to generate the MolGraphs.

class chemprop.featurizers.molgraph.MolGraphCacheOnTheFly(inputs, V_fs, E_fs, featurizer)[source]#

Bases: MolGraphCacheFacade

A MolGraphCacheOnTheFly computes the corresponding MolGraphs as they are requested.

Parameters:
__len__()[source]#
Return type:

int

__getitem__(index)[source]#
Parameters:

index (int)

Return type:

chemprop.data.molgraph.MolGraph

class chemprop.featurizers.molgraph.CuikmolmakerMolGraphFeaturizer[source]#

Bases: chemprop.featurizers.base.Featurizer[list[str], BatchCuikMolGraph]

A CuikmolmakerMolGraphFeaturizer featurizes a list of molecules at once instead of one molecule at a time for efficiency.

Parameters:
  • atom_featurizer_mode (str, default="V2") – The mode of the atom featurizer (V1, V2, ORGANIC, RIGR) to use.

  • extra_atom_fdim (int, default=0) – the dimension of the additional features that will be concatenated onto the calculated features of each atom

  • extra_bond_fdim (int, default=0) – the dimension of the additional features that will be concatenated onto the calculated features of each bond

  • add_h (bool, default=False) – whether to add hydrogens to the Chem.Mol objects created from the input SMILES strings

atom_featurizer_mode: Literal['V1', 'V2', 'ORGANIC', 'RIGR'] = 'V2'#
extra_atom_fdim: int = 0#
extra_bond_fdim: int = 0#
add_h: bool = False#
atom_fdim: int#
bond_fdim: int#
__post_init__()[source]#
__call__(smiles_list, atom_features_extra=None, bond_features_extra=None)[source]#

featurize an input

Parameters:
  • smiles_list (list[str])

  • atom_features_extra (numpy.ndarray | None)

  • bond_features_extra (numpy.ndarray | None)

Return type:

BatchCuikMolGraph

class chemprop.featurizers.molgraph.SimpleMoleculeMolGraphFeaturizer[source]#

Bases: chemprop.featurizers.molgraph.mixins._MolGraphFeaturizerMixin, chemprop.featurizers.base.GraphFeaturizer[rdkit.Chem.Mol]

A SimpleMoleculeMolGraphFeaturizer is the default implementation of a MoleculeMolGraphFeaturizer

Parameters:
  • atom_featurizer (AtomFeaturizer, default=MultiHotAtomFeaturizer()) – the featurizer with which to calculate feature representations of the atoms in a given molecule

  • bond_featurizer (BondFeaturizer, default=MultiHotBondFeaturizer()) – the featurizer with which to calculate feature representations of the bonds in a given molecule

  • extra_atom_fdim (int, default=0) – the dimension of the additional features that will be concatenated onto the calculated features of each atom

  • extra_bond_fdim (int, default=0) – the dimension of the additional features that will be concatenated onto the calculated features of each bond

extra_atom_fdim: int = 0#
extra_bond_fdim: int = 0#
__post_init__()[source]#
__call__(mol, atom_features_extra=None, bond_features_extra=None)[source]#
Parameters:
  • mol (rdkit.Chem.Mol)

  • atom_features_extra (numpy.ndarray | None)

  • bond_features_extra (numpy.ndarray | None)

Return type:

chemprop.data.molgraph.MolGraph

type chemprop.featurizers.molgraph.CGRFeaturizer = CondensedGraphOfReactionFeaturizer#
class chemprop.featurizers.molgraph.CondensedGraphOfReactionFeaturizer[source]#

Bases: chemprop.featurizers.molgraph.mixins._MolGraphFeaturizerMixin, chemprop.featurizers.base.GraphFeaturizer[chemprop.types.Rxn]

A CondensedGraphOfReactionFeaturizer featurizes reactions using the condensed reaction graph method utilized in [1]

NOTE: This class does not accept a AtomFeaturizer instance. This is because it requries the num_only() method, which is only implemented in the concrete AtomFeaturizer class

Parameters:
  • atom_featurizer (AtomFeaturizer, default=AtomFeaturizer()) – the featurizer with which to calculate feature representations of the atoms in a given molecule

  • bond_featurizer (BondFeaturizerBase, default=BondFeaturizer()) – the featurizer with which to calculate feature representations of the bonds in a given molecule

  • mode (Union[str, ReactionMode], default=ReactionMode.REAC_DIFF) – the mode by which to featurize the reaction as either the string code or enum value

References

mode_: dataclasses.InitVar[str | RxnMode]#
__post_init__(mode_)[source]#
Parameters:

mode_ (str | RxnMode)

property mode: RxnMode#
Return type:

RxnMode

__call__(rxn, atom_features_extra=None, bond_features_extra=None)[source]#

Featurize the input reaction into a molecular graph

Parameters:
  • rxn (Rxn) – a 2-tuple of atom-mapped rdkit molecules, where the 0th element is the reactant and the 1st element is the product

  • atom_features_extra (np.ndarray | None, default=None) – UNSUPPORTED maintained only to maintain parity with the method signature of the MoleculeFeaturizer

  • bond_features_extra (np.ndarray | None, default=None) – UNSUPPORTED maintained only to maintain parity with the method signature of the MoleculeFeaturizer

Returns:

the molecular graph of the reaction

Return type:

MolGraph

classmethod map_reac_to_prod(reacs, pdts)[source]#

Map atom indices between corresponding atoms in the reactant and product molecules

Parameters:
  • reacs (Chem.Mol) – An RDKit molecule of the reactants

  • pdts (Chem.Mol) – An RDKit molecule of the products

Returns:

  • ri2pi (dict[int, int]) – A dictionary of corresponding atom indices from reactant atoms to product atoms

  • pdt_idxs (list[int]) – atom indices of poduct atoms

  • rct_idxs (list[int]) – atom indices of reactant atoms

Return type:

tuple[dict[int, int], list[int], list[int]]

class chemprop.featurizers.molgraph.RxnMode[source]#

Bases: chemprop.utils.utils.EnumMapping

The mode by which a reaction should be featurized into a MolGraph

REAC_PROD#

concatenate the reactant features with the product features.

REAC_PROD_BALANCE#

concatenate the reactant features with the products feature and balances imbalanced reactions

REAC_DIFF#

concatenates the reactant features with the difference in features between reactants and products

REAC_DIFF_BALANCE#

concatenates the reactant features with the difference in features between reactants and product and balances imbalanced reactions

PROD_DIFF#

concatenates the product features with the difference in features between reactants and products

PROD_DIFF_BALANCE#

concatenates the product features with the difference in features between reactants and products and balances imbalanced reactions