Atom featurizers

Atom featurizers#

[1]:
from chemprop.featurizers.atom import MultiHotAtomFeaturizer

This is an example atom to featurize.

[2]:
from rdkit import Chem

atom_to_featurize = Chem.MolFromSmiles("CC").GetAtoms()[0]

Atom features#

The following atom features are generated by rdkit and cast to one-hot vectors (except for mass which is divided by 100). These feature vectors are joined together to a single multi-hot feature vector (with a final float32 bit for mass). All of these features (except aromaticity and mass) are padded with an extra bit for all unknown values.

  • atomic number

  • degree

  • formal charge

  • chiral tag

  • number of hydrogens

  • hybridization

  • aromaticity

  • mass

v2#

The v2 atom featurizer is the default. It provides bits in the feature vector for:

  • atomic number

    • first four rows of the period table plus iodine

  • degree

    • 0 bonds - 5 bonds

  • formal charge

    • -2, -1, 0, 1, 2

  • chiral tag

    • 0, 1, 2, 3 - See rdkit.Chem.rdchem.ChiralType for more details

  • number of hydrogens

    • 0 - 4

  • hybridization

    • S, SP, SP2, SP2D, SP3, SP3D, SP3D2

[3]:
featurizer = MultiHotAtomFeaturizer.v2()
featurizer(atom_to_featurize)
[3]:
array([0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       1.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
       0.     , 0.12011])

v1#

The v1 atom featurizer is the same as was used in Chemprop v1. It is the same as the v2 atom featurizer except for:

  • atomic number

    • first 100 elements (customizable)

  • hybridization

    • SP, SP2, SP3, SP3D, SP3D2

[4]:
featurizer = MultiHotAtomFeaturizer.v1()
featurizer(atom_to_featurize)
[4]:
array([0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       1.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
       0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.12011])
[5]:
featurizer = MultiHotAtomFeaturizer.v1(max_atomic_num=53)
featurizer(atom_to_featurize)
[5]:
array([0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 1.     , 0.     , 1.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,
       0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
       0.     , 0.12011])

organic#

The organic atom featurizer is optimized to reduce feature vector size for organic molecule. It is the same as the v2 atom featurizer except for:

  • atomic number

    • H, B, C, N, O, F, Si, P, S, Cl, Br, and I atoms

  • hybridization

    • S, SP, SP2, SP3

[6]:
featurizer = MultiHotAtomFeaturizer.organic()
featurizer(atom_to_featurize)
[6]:
array([0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 1.     , 0.     , 1.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     ,
       0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,
       0.     , 0.12011])

Custom#

Custom atom featurizers can also be created by specifying the choices. Custom choices for atomic number, degree, formal charge, chiral tag, # of hydrogens, and hybridization can be specified to create a custom atom featurizer. Aromaticity featurization is always True/False.

[7]:
from rdkit.Chem.rdchem import HybridizationType

atomic_nums = [1, 6, 7, 8]
degrees = [0, 1, 2, 3, 4]
formal_charges = [-2, -1, 0, 1, 2]
chiral_tags = [0, 1, 2, 3]
num_Hs = [0, 1, 2, 3, 4]
hybridizations = [HybridizationType.SP, HybridizationType.SP2, HybridizationType.SP3]
featurizer = MultiHotAtomFeaturizer(
    atomic_nums, degrees, formal_charges, chiral_tags, num_Hs, hybridizations
)
featurizer(atom_to_featurize)
[7]:
array([0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 1.     ,
       0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
       0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ,
       0.     , 0.     , 1.     , 0.     , 0.     , 0.12011])

Generic#

Any class that has a length and returns a numpy array when given an rdkit.Chem.rdchem.Atom can be used as an atom featurizer.

[8]:
from rdkit.Chem.rdchem import Atom
import numpy as np


class MyAtomFeaturizer:
    def __len__(self):
        return 1

    def __call__(self, a: Atom):
        return np.array([a.GetAtomicNum()], dtype=float)


featurizer = MyAtomFeaturizer()
featurizer(atom_to_featurize)
[8]:
array([6.])