chemprop.featurizers
====================

.. py:module:: chemprop.featurizers


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/chemprop/featurizers/atom/index
   /autoapi/chemprop/featurizers/base/index
   /autoapi/chemprop/featurizers/bond/index
   /autoapi/chemprop/featurizers/molecule/index
   /autoapi/chemprop/featurizers/molgraph/index


Attributes
----------

.. autoapisummary::

   chemprop.featurizers.S
   chemprop.featurizers.T
   chemprop.featurizers.MoleculeFeaturizerRegistry
   chemprop.featurizers.CGRFeaturizer


Classes
-------

.. autoapisummary::

   chemprop.featurizers.AtomFeatureMode
   chemprop.featurizers.MultiHotAtomFeaturizer
   chemprop.featurizers.Featurizer
   chemprop.featurizers.GraphFeaturizer
   chemprop.featurizers.VectorFeaturizer
   chemprop.featurizers.MultiHotBondFeaturizer
   chemprop.featurizers.BinaryFeaturizerMixin
   chemprop.featurizers.CountFeaturizerMixin
   chemprop.featurizers.MorganBinaryFeaturizer
   chemprop.featurizers.MorganCountFeaturizer
   chemprop.featurizers.MorganFeaturizerMixin
   chemprop.featurizers.RDKit2DFeaturizer
   chemprop.featurizers.V1RDKit2DFeaturizer
   chemprop.featurizers.V1RDKit2DNormalizedFeaturizer
   chemprop.featurizers.BatchCuikMolGraph
   chemprop.featurizers.CondensedGraphOfReactionFeaturizer
   chemprop.featurizers.CuikmolmakerMolGraphFeaturizer
   chemprop.featurizers.MolGraphCache
   chemprop.featurizers.MolGraphCacheFacade
   chemprop.featurizers.MolGraphCacheOnTheFly
   chemprop.featurizers.RxnMode
   chemprop.featurizers.SimpleMoleculeMolGraphFeaturizer


Functions
---------

.. autoapisummary::

   chemprop.featurizers.get_multi_hot_atom_featurizer


Package Contents
----------------

.. py:class:: AtomFeatureMode

   Bases: :py:obj:`chemprop.utils.utils.EnumMapping`


   The mode of an atom is used for featurization into a `MolGraph`


   .. py:attribute:: V1


   .. py:attribute:: V2


   .. py:attribute:: ORGANIC


   .. py:attribute:: RIGR


.. py:class:: MultiHotAtomFeaturizer(atomic_nums, degrees, formal_charges, chiral_tags, num_Hs, hybridizations)

   Bases: :py:obj:`chemprop.featurizers.base.VectorFeaturizer`\ [\ :py:obj:`rdkit.Chem.rdchem.Atom`\ ]


   A :class:`MultiHotAtomFeaturizer` uses a multi-hot encoding to featurize atoms.

   .. seealso::
       The class provides three default parameterization schemes:

       * :meth:`MultiHotAtomFeaturizer.v1`
       * :meth:`MultiHotAtomFeaturizer.v2`
       * :meth:`MultiHotAtomFeaturizer.organic`

   The generated atom features are ordered as follows:
   * atomic number
   * degree
   * formal charge
   * chiral tag
   * number of hydrogens
   * hybridization
   * aromaticity
   * mass

   .. important::
       Each feature, except for aromaticity and mass, includes a pad for unknown values.

   :param atomic_nums: the choices for atom type denoted by atomic number. Ex: ``[4, 5, 6]`` for C, N and O.
   :type atomic_nums: Sequence[int]
   :param degrees: the choices for number of bonds an atom is engaged in.
   :type degrees: Sequence[int]
   :param formal_charges: the choices for integer electronic charge assigned to an atom.
   :type formal_charges: Sequence[int]
   :param chiral_tags: the choices for an atom's chiral tag. See :class:`rdkit.Chem.rdchem.ChiralType` for possible integer values.
   :type chiral_tags: Sequence[int]
   :param num_Hs: the choices for number of bonded hydrogen atoms.
   :type num_Hs: Sequence[int]
   :param hybridizations: the choices for an atom’s hybridization type. See :class:`rdkit.Chem.rdchem.HybridizationType` for possible integer values.
   :type hybridizations: Sequence[int]


   .. py:attribute:: atomic_nums


   .. py:attribute:: degrees


   .. py:attribute:: formal_charges


   .. py:attribute:: chiral_tags


   .. py:attribute:: num_Hs


   .. py:attribute:: hybridizations


   .. py:method:: __len__()


   .. py:method:: __call__(a)


   .. py:method:: num_only(a)

      featurize the atom by setting only the atomic number bit



   .. py:method:: v1(max_atomic_num = 100)
      :classmethod:


      The original implementation used in Chemprop V1 [1]_, [2]_.

      :param max_atomic_num: Include a bit for all atomic numbers in the interval :math:`[1, \mathtt{max\_atomic\_num}]`
      :type max_atomic_num: int, default=100

      .. rubric:: References

      .. [1] Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.;
          Kelley, B.; Mathea, M.; Palmer, A. "Analyzing Learned Molecular Representations for Property Prediction."
          J. Chem. Inf. Model. 2019, 59 (8), 3370–3388. https://doi.org/10.1021/acs.jcim.9b00237
      .. [2] Heid, E.; Greenman, K.P.; Chung, Y.; Li, S.C.; Graff, D.E.; Vermeire, F.H.; Wu, H.; Green, W.H.; McGill,
          C.J. "Chemprop: A machine learning package for chemical property prediction." J. Chem. Inf. Model. 2024,
          64 (1), 9–17. https://doi.org/10.1021/acs.jcim.3c01250



   .. py:method:: v2()
      :classmethod:


      An implementation that includes an atom type bit for all elements in the first four rows of the periodic table plus iodine.



   .. py:method:: organic()
      :classmethod:


      A specific parameterization intended for use with organic or drug-like molecules.

      This parameterization features:
          1. includes an atomic number bit only for H, B, C, N, O, F, Si, P, S, Cl, Br, and I atoms
          2. a hybridization bit for :math:`s, sp, sp^2` and :math:`sp^3` hybridizations.



.. py:function:: get_multi_hot_atom_featurizer(mode)

   Build the corresponding multi-hot atom featurizer.


.. py:class:: Featurizer

   Bases: :py:obj:`Generic`\ [\ :py:obj:`S`\ , :py:obj:`T`\ ]


   An :class:`Featurizer` featurizes inputs type ``S`` into outputs of
   type ``T``.


   .. py:method:: __call__(input, *args, **kwargs)
      :abstractmethod:


      featurize an input



.. py:class:: GraphFeaturizer

   Bases: :py:obj:`Featurizer`\ [\ :py:obj:`S`\ , :py:obj:`chemprop.data.molgraph.MolGraph`\ ]


   An :class:`Featurizer` featurizes inputs type ``S`` into outputs of
   type ``T``.


   .. py:property:: shape
      :type: tuple[int, int]

      :abstractmethod:



.. py:data:: S

.. py:data:: T

.. py:class:: VectorFeaturizer

   Bases: :py:obj:`Featurizer`\ [\ :py:obj:`S`\ , :py:obj:`numpy.ndarray`\ ], :py:obj:`collections.abc.Sized`


   An :class:`Featurizer` featurizes inputs type ``S`` into outputs of
   type ``T``.


.. py:class:: MultiHotBondFeaturizer(bond_types = None, stereos = None)

   Bases: :py:obj:`chemprop.featurizers.base.VectorFeaturizer`\ [\ :py:obj:`rdkit.Chem.rdchem.Bond`\ ]


   A :class:`MultiHotBondFeaturizer` feauturizes bonds based on the following attributes:

   * ``null``-ity (i.e., is the bond ``None``?)
   * bond type
   * conjugated?
   * in ring?
   * stereochemistry

   The feature vectors produced by this featurizer have the following (general) signature:

   +---------------------+-----------------+--------------+
   | slice [start, stop) | subfeature      | unknown pad? |
   +=====================+=================+==============+
   | 0-1                 | null?           | N            |
   +---------------------+-----------------+--------------+
   | 1-5                 | bond type       | N            |
   +---------------------+-----------------+--------------+
   | 5-6                 | conjugated?     | N            |
   +---------------------+-----------------+--------------+
   | 6-8                 | in ring?        | N            |
   +---------------------+-----------------+--------------+
   | 7-14                | stereochemistry | Y            |
   +---------------------+-----------------+--------------+

   **NOTE**: the above signature only applies for the default arguments, as the bond type and
   sterochemistry slices can increase in size depending on the input arguments.

   :param bond_types: the known bond types
   :type bond_types: Sequence[BondType] | None, default=[SINGLE, DOUBLE, TRIPLE, AROMATIC]
   :param stereos: the known bond stereochemistries. See [1]_ for more details
   :type stereos: Sequence[int] | None, default=[0, 1, 2, 3, 4, 5]

   .. rubric:: References

   .. [1] https://www.rdkit.org/docs/source/rdkit.Chem.rdchem.html#rdkit.Chem.rdchem.BondStereo.values


   .. py:attribute:: bond_types


   .. py:attribute:: stereo


   .. py:method:: __len__()


   .. py:method:: __call__(b)


   .. py:method:: one_hot_index(x, xs)
      :classmethod:


      Returns a tuple of the index of ``x`` in ``xs`` and ``len(xs) + 1`` if ``x`` is in ``xs``.
      Otherwise, returns a tuple with ``len(xs)`` and ``len(xs) + 1``.



.. py:class:: BinaryFeaturizerMixin

   .. py:method:: __call__(mol)


.. py:class:: CountFeaturizerMixin

   .. py:method:: __call__(mol)


.. py:data:: MoleculeFeaturizerRegistry

.. py:class:: MorganBinaryFeaturizer(radius = 2, length = 2048, include_chirality = True)

   Bases: :py:obj:`MorganFeaturizerMixin`, :py:obj:`BinaryFeaturizerMixin`, :py:obj:`chemprop.featurizers.base.VectorFeaturizer`\ [\ :py:obj:`rdkit.Chem.Mol`\ ]


.. py:class:: MorganCountFeaturizer(radius = 2, length = 2048, include_chirality = True)

   Bases: :py:obj:`MorganFeaturizerMixin`, :py:obj:`CountFeaturizerMixin`, :py:obj:`chemprop.featurizers.base.VectorFeaturizer`\ [\ :py:obj:`rdkit.Chem.Mol`\ ]


.. py:class:: MorganFeaturizerMixin(radius = 2, length = 2048, include_chirality = True)

   .. py:attribute:: length
      :value: 2048



   .. py:attribute:: F


   .. py:method:: __len__()


.. py:class:: RDKit2DFeaturizer

   Bases: :py:obj:`chemprop.featurizers.base.VectorFeaturizer`\ [\ :py:obj:`rdkit.Chem.Mol`\ ]


   .. py:method:: __len__()


   .. py:method:: __call__(mol)


.. py:class:: V1RDKit2DFeaturizer

   Bases: :py:obj:`V1RDKit2DFeaturizerMixin`


   .. py:attribute:: generator


.. py:class:: V1RDKit2DNormalizedFeaturizer

   Bases: :py:obj:`V1RDKit2DFeaturizerMixin`


   .. py:attribute:: generator


.. py:class:: BatchCuikMolGraph

   .. py:attribute:: V
      :type:  torch.Tensor

      the atom feature matrix


   .. py:attribute:: E
      :type:  torch.Tensor

      the bond feature matrix


   .. py:attribute:: edge_index
      :type:  torch.Tensor

      an tensor of shape ``2 x E`` containing the edges of the graph in COO format


   .. py:attribute:: rev_edge_index
      :type:  torch.Tensor

      A tensor of shape ``E`` that maps from an edge index to the index of the source of the
      reverse edge in the ``edge_index`` attribute.


   .. py:attribute:: batch
      :type:  torch.Tensor

      the index of the parent :class:`MolGraph` in the batched graph


   .. py:method:: __post_init__()


   .. py:method:: __len__()

      the number of individual :class:`MolGraph`\s in this batch



   .. py:method:: to(device)


.. py:type:: CGRFeaturizer
   :canonical: CondensedGraphOfReactionFeaturizer


.. py:class:: CondensedGraphOfReactionFeaturizer

   Bases: :py:obj:`chemprop.featurizers.molgraph.mixins._MolGraphFeaturizerMixin`, :py:obj:`chemprop.featurizers.base.GraphFeaturizer`\ [\ :py:obj:`chemprop.types.Rxn`\ ]


   A :class:`CondensedGraphOfReactionFeaturizer` featurizes reactions using the condensed
   reaction graph method utilized in [1]_

   **NOTE**: This class *does not* accept a :class:`AtomFeaturizer` instance. This is because
   it requries the :meth:`num_only()` method, which is only implemented in the concrete
   :class:`AtomFeaturizer` class

   :param atom_featurizer: the featurizer with which to calculate feature representations of the atoms in a given
                           molecule
   :type atom_featurizer: AtomFeaturizer, default=AtomFeaturizer()
   :param bond_featurizer: the featurizer with which to calculate feature representations of the bonds in a given
                           molecule
   :type bond_featurizer: BondFeaturizerBase, default=BondFeaturizer()
   :param mode_: the mode by which to featurize the reaction as either the string code or enum value
   :type mode_: Union[str, ReactionMode], default=ReactionMode.REAC_DIFF

   .. rubric:: References

   .. [1] Heid, E.; Green, W.H. "Machine Learning of Reaction Properties via Learned
       Representations of the Condensed Graph of Reaction." J. Chem. Inf. Model. 2022, 62,
       2101-2110. https://doi.org/10.1021/acs.jcim.1c00975


   .. py:attribute:: mode_
      :type:  dataclasses.InitVar[str | RxnMode]


   .. py:method:: __post_init__(mode_)


   .. py:property:: mode
      :type: RxnMode



   .. py:method:: __call__(rxn, atom_features_extra = None, bond_features_extra = None)

      Featurize the input reaction into a molecular graph

      :param rxn: a 2-tuple of atom-mapped rdkit molecules, where the 0th element is the reactant and the
                  1st element is the product
      :type rxn: Rxn
      :param atom_features_extra: *UNSUPPORTED* maintained only to maintain parity with the method signature of the
                                  `MoleculeFeaturizer`
      :type atom_features_extra: np.ndarray | None, default=None
      :param bond_features_extra: *UNSUPPORTED* maintained only to maintain parity with the method signature of the
                                  `MoleculeFeaturizer`
      :type bond_features_extra: np.ndarray | None, default=None

      :returns: the molecular graph of the reaction
      :rtype: MolGraph



   .. py:method:: map_reac_to_prod(reacs, pdts)
      :classmethod:


      Map atom indices between corresponding atoms in the reactant and product molecules

      :param reacs: An RDKit molecule of the reactants
      :type reacs: Chem.Mol
      :param pdts: An RDKit molecule of the products
      :type pdts: Chem.Mol

      :returns: * **ri2pi** (*dict[int, int]*) -- A dictionary of corresponding atom indices from reactant atoms to product atoms
                * **pdt_idxs** (*list[int]*) -- atom indices of poduct atoms
                * **rct_idxs** (*list[int]*) -- atom indices of reactant atoms



.. py:class:: CuikmolmakerMolGraphFeaturizer

   Bases: :py:obj:`chemprop.featurizers.base.Featurizer`\ [\ :py:obj:`list`\ [\ :py:obj:`str`\ ]\ , :py:obj:`BatchCuikMolGraph`\ ]


   A :class:`CuikmolmakerMolGraphFeaturizer` featurizes a list of molecules at once instead of
   one molecule at a time for efficiency.

   :param atom_featurizer_mode: The mode of the atom featurizer (V1, V2, ORGANIC, RIGR) to use.
   :type atom_featurizer_mode: str, default="V2"
   :param extra_atom_fdim: the dimension of the additional features that will be concatenated onto the calculated
                           features of each atom
   :type extra_atom_fdim: int, default=0
   :param extra_bond_fdim: the dimension of the additional features that will be concatenated onto the calculated
                           features of each bond
   :type extra_bond_fdim: int, default=0
   :param add_h: whether to add hydrogens to the `Chem.Mol` objects created from the input SMILES strings
   :type add_h: bool, default=False


   .. py:attribute:: atom_featurizer_mode
      :type:  Literal['V1', 'V2', 'ORGANIC', 'RIGR']
      :value: 'V2'



   .. py:attribute:: extra_atom_fdim
      :type:  int
      :value: 0



   .. py:attribute:: extra_bond_fdim
      :type:  int
      :value: 0



   .. py:attribute:: add_h
      :type:  bool
      :value: False



   .. py:attribute:: atom_fdim
      :type:  int


   .. py:attribute:: bond_fdim
      :type:  int


   .. py:method:: __post_init__()


   .. py:method:: __call__(smiles_list, atom_features_extra = None, bond_features_extra = None)

      featurize an input



.. py:class:: MolGraphCache(inputs, V_fs, E_fs, featurizer, n_workers = 0)

   Bases: :py:obj:`MolGraphCacheFacade`


   A :class:`MolGraphCache` precomputes the corresponding
   :class:`~chemprop.data.molgraph.MolGraph`\s and caches them in memory.


   .. py:method:: __len__()


   .. py:method:: __getitem__(index)


.. py:class:: MolGraphCacheFacade(inputs, V_fs, E_fs, featurizer)

   Bases: :py:obj:`collections.abc.Sequence`\ [\ :py:obj:`chemprop.data.molgraph.MolGraph`\ ], :py:obj:`Generic`\ [\ :py:obj:`chemprop.featurizers.base.S`\ ]


   A :class:`MolGraphCacheFacade` provided an interface for caching
   :class:`~chemprop.data.molgraph.MolGraph`\s.

   .. note::
       This class only provides a facade for a cached dataset, but it *does not guarantee*
       whether the underlying data is truly cached.


   :param inputs: The inputs to be featurized.
   :type inputs: Iterable[S]
   :param V_fs: The node features for each input.
   :type V_fs: Iterable[np.ndarray]
   :param E_fs: The edge features for each input.
   :type E_fs: Iterable[np.ndarray]
   :param featurizer: The featurizer with which to generate the
                      :class:`~chemprop.data.molgraph.MolGraph`\s.
   :type featurizer: Featurizer[S, MolGraph]


.. py:class:: MolGraphCacheOnTheFly(inputs, V_fs, E_fs, featurizer)

   Bases: :py:obj:`MolGraphCacheFacade`


   A :class:`MolGraphCacheOnTheFly` computes the corresponding
   :class:`~chemprop.data.molgraph.MolGraph`\s as they are requested.


   .. py:method:: __len__()


   .. py:method:: __getitem__(index)


.. py:class:: RxnMode

   Bases: :py:obj:`chemprop.utils.utils.EnumMapping`


   The mode by which a reaction should be featurized into a `MolGraph`


   .. py:attribute:: REAC_PROD

      concatenate the reactant features with the product features.


   .. py:attribute:: REAC_PROD_BALANCE

      concatenate the reactant features with the products feature and balances imbalanced
      reactions


   .. py:attribute:: REAC_DIFF

      concatenates the reactant features with the difference in features between reactants and
      products


   .. py:attribute:: REAC_DIFF_BALANCE

      concatenates the reactant features with the difference in features between reactants and
      product and balances imbalanced reactions


   .. py:attribute:: PROD_DIFF

      concatenates the product features with the difference in features between reactants and
      products


   .. py:attribute:: PROD_DIFF_BALANCE

      concatenates the product features with the difference in features between reactants and
      products and balances imbalanced reactions


.. py:class:: SimpleMoleculeMolGraphFeaturizer

   Bases: :py:obj:`chemprop.featurizers.molgraph.mixins._MolGraphFeaturizerMixin`, :py:obj:`chemprop.featurizers.base.GraphFeaturizer`\ [\ :py:obj:`rdkit.Chem.Mol`\ ]


   A :class:`SimpleMoleculeMolGraphFeaturizer` is the default implementation of a
   :class:`MoleculeMolGraphFeaturizer`

   :param atom_featurizer: the featurizer with which to calculate feature representations of the atoms in a given
                           molecule
   :type atom_featurizer: AtomFeaturizer, default=MultiHotAtomFeaturizer()
   :param bond_featurizer: the featurizer with which to calculate feature representations of the bonds in a given
                           molecule
   :type bond_featurizer: BondFeaturizer, default=MultiHotBondFeaturizer()
   :param extra_atom_fdim: the dimension of the additional features that will be concatenated onto the calculated
                           features of each atom
   :type extra_atom_fdim: int, default=0
   :param extra_bond_fdim: the dimension of the additional features that will be concatenated onto the calculated
                           features of each bond
   :type extra_bond_fdim: int, default=0


   .. py:attribute:: extra_atom_fdim
      :type:  int
      :value: 0



   .. py:attribute:: extra_bond_fdim
      :type:  int
      :value: 0



   .. py:method:: __post_init__()


   .. py:method:: __call__(mol, atom_features_extra = None, bond_features_extra = None)


