chemprop.featurizers.atom#

Classes#

MultiHotAtomFeaturizer

A MultiHotAtomFeaturizer uses a multi-hot encoding to featurize atoms.

RIGRAtomFeaturizer

A RIGRAtomFeaturizer uses a multi-hot encoding to featurize atoms using

AtomFeatureMode

The mode of an atom is used for featurization into a MolGraph

Functions#

get_multi_hot_atom_featurizer(mode)

Build the corresponding multi-hot atom featurizer.

Module Contents#

class chemprop.featurizers.atom.MultiHotAtomFeaturizer(atomic_nums, degrees, formal_charges, chiral_tags, num_Hs, hybridizations)[source]#

Bases: chemprop.featurizers.base.VectorFeaturizer[rdkit.Chem.rdchem.Atom]

A MultiHotAtomFeaturizer uses a multi-hot encoding to featurize atoms.

See also

The class provides three default parameterization schemes:

The generated atom features are ordered as follows: * atomic number * degree * formal charge * chiral tag * number of hydrogens * hybridization * aromaticity * mass

Important

Each feature, except for aromaticity and mass, includes a pad for unknown values.

Parameters:
  • atomic_nums (Sequence[int]) – the choices for atom type denoted by atomic number. Ex: [4, 5, 6] for C, N and O.

  • degrees (Sequence[int]) – the choices for number of bonds an atom is engaged in.

  • formal_charges (Sequence[int]) – the choices for integer electronic charge assigned to an atom.

  • chiral_tags (Sequence[int]) – the choices for an atom’s chiral tag. See rdkit.Chem.rdchem.ChiralType for possible integer values.

  • num_Hs (Sequence[int]) – the choices for number of bonded hydrogen atoms.

  • hybridizations (Sequence[int]) – the choices for an atom’s hybridization type. See rdkit.Chem.rdchem.HybridizationType for possible integer values.

atomic_nums#
degrees#
formal_charges#
chiral_tags#
num_Hs#
hybridizations#
__len__()[source]#
Return type:

int

__call__(a)[source]#
Parameters:

a (rdkit.Chem.rdchem.Atom | None)

Return type:

numpy.ndarray

num_only(a)[source]#

featurize the atom by setting only the atomic number bit

Parameters:

a (rdkit.Chem.rdchem.Atom)

Return type:

numpy.ndarray

classmethod v1(max_atomic_num=100)[source]#

The original implementation used in Chemprop V1 [1]_, [2].

Parameters:

max_atomic_num (int, default=100) – Include a bit for all atomic numbers in the interval \([1, \mathtt{max\_atomic\_num}]\)

References

classmethod v2()[source]#

An implementation that includes an atom type bit for all elements in the first four rows of the periodic table plus iodine.

classmethod organic()[source]#

A specific parameterization intended for use with organic or drug-like molecules.

This parameterization features:
  1. includes an atomic number bit only for H, B, C, N, O, F, Si, P, S, Cl, Br, and I atoms

  2. a hybridization bit for \(s, sp, sp^2\) and \(sp^3\) hybridizations.

class chemprop.featurizers.atom.RIGRAtomFeaturizer(atomic_nums=None, degrees=None, num_Hs=None)[source]#

Bases: chemprop.featurizers.base.VectorFeaturizer[rdkit.Chem.rdchem.Atom]

A RIGRAtomFeaturizer uses a multi-hot encoding to featurize atoms using resonance-invariant features [1]_.

The generated atom features are ordered as follows: * atomic number * degree * number of hydrogens * mass

References

Parameters:
  • atomic_nums (Sequence[int] | None)

  • degrees (Sequence[int] | None)

  • num_Hs (Sequence[int] | None)

atomic_nums#
degrees#
num_Hs#
__len__()[source]#
Return type:

int

__call__(a)[source]#
Parameters:

a (rdkit.Chem.rdchem.Atom | None)

Return type:

numpy.ndarray

num_only(a)[source]#

featurize the atom by setting only the atomic number bit

Parameters:

a (rdkit.Chem.rdchem.Atom)

Return type:

numpy.ndarray

class chemprop.featurizers.atom.AtomFeatureMode[source]#

Bases: chemprop.utils.utils.EnumMapping

The mode of an atom is used for featurization into a MolGraph

V1#
V2#
ORGANIC#
RIGR#
chemprop.featurizers.atom.get_multi_hot_atom_featurizer(mode)[source]#

Build the corresponding multi-hot atom featurizer.

Parameters:

mode (str | AtomFeatureMode)

Return type:

MultiHotAtomFeaturizer