chemprop.featurizers.atom#

Module Contents#

Classes#

MultiHotAtomFeaturizer

A MultiHotAtomFeaturizer uses a multi-hot encoding to featurize atoms.

AtomFeatureMode

The mode of an atom is used for featurization into a MolGraph

Functions#

get_multi_hot_atom_featurizer(mode)

Build the corresponding multi-hot atom featurizer.

class chemprop.featurizers.atom.MultiHotAtomFeaturizer(atomic_nums, degrees, formal_charges, chiral_tags, num_Hs, hybridizations)[source]#

Bases: chemprop.featurizers.base.VectorFeaturizer[rdkit.Chem.rdchem.Atom]

A MultiHotAtomFeaturizer uses a multi-hot encoding to featurize atoms.

See also

The class provides three default parameterization schemes:

The generated atom features are ordered as follows: * atomic number * degree * formal charge * chiral tag * number of hydrogens * hybridization * aromaticity * mass

Important

Each feature, except for aromaticity and mass, includes a pad for unknown values.

Parameters:
  • atomic_nums (Sequence[int]) – the choices for atom type denoted by atomic number. Ex: [4, 5, 6] for C, N and O.

  • degrees (Sequence[int]) – the choices for number of bonds an atom is engaged in.

  • formal_charges (Sequence[int]) – the choices for integer electronic charge assigned to an atom.

  • chiral_tags (Sequence[int]) – the choices for an atom’s chiral tag. See rdkit.Chem.rdchem.ChiralType for possible integer values.

  • num_Hs (Sequence[int]) – the choices for number of bonded hydrogen atoms.

  • hybridizations (Sequence[int]) – the choices for an atom’s hybridization type. See rdkit.Chem.rdchem.HybridizationType for possible integer values.

__len__()[source]#
Return type:

int

__call__(a)[source]#
Parameters:

a (rdkit.Chem.rdchem.Atom | None)

Return type:

numpy.ndarray

num_only(a)[source]#

featurize the atom by setting only the atomic number bit

Parameters:

a (rdkit.Chem.rdchem.Atom)

Return type:

numpy.ndarray

classmethod v1(max_atomic_num=100)[source]#

The original implementation used in Chemprop V1 [1], [2]_.

Parameters:

max_atomic_num (int, default=100) – Include a bit for all atomic numbers in the interval \([1, \mathtt{max_atomic_num}]\)

References

Kelley, B.; Mathea, M.; Palmer, A. “Analyzing Learned Molecular Representations for Property Prediction.” J. Chem. Inf. Model. 2019, 59 (8), 3370–3388. https://doi.org/10.1021/acs.jcim.9b00237 .. [2] Heid, E.; Greenman, K.P.; Chung, Y.; Li, S.C.; Graff, D.E.; Vermeire, F.H.; Wu, H.; Green, W.H.; McGill, C.J. “Chemprop: A machine learning package for chemical property prediction.” J. Chem. Inf. Model. 2024, 64 (1), 9–17. https://doi.org/10.1021/acs.jcim.3c01250

classmethod v2()[source]#

An implementation that includes an atom type bit for all elements in the first four rows of the periodic table plus iodine.

classmethod organic()[source]#

A specific parameterization intended for use with organic or drug-like molecules.

This parameterization features:
  1. includes an atomic number bit only for H, B, C, N, O, F, Si, P, S, Cl, Br, and I atoms

  2. a hybridization bit for \(s, sp, sp^2\) and \(sp^3\) hybridizations.

class chemprop.featurizers.atom.AtomFeatureMode[source]#

Bases: chemprop.utils.utils.EnumMapping

The mode of an atom is used for featurization into a MolGraph

V1#
V2#
ORGANIC#
chemprop.featurizers.atom.get_multi_hot_atom_featurizer(mode)[source]#

Build the corresponding multi-hot atom featurizer.

Parameters:

mode (str | AtomFeatureMode)

Return type:

MultiHotAtomFeaturizer