Bond featurizers

Bond featurizers#

[1]:
from chemprop.featurizers.bond import MultiHotBondFeaturizer

This is an example bond to featurize.

[2]:
from rdkit import Chem

bond_to_featurize = Chem.MolFromSmiles("CC").GetBondBetweenAtoms(0, 1)

Bond features#

The following bond features are generated by rdkit and cast to one-hot vectors (except for the initial null bit which is True/False depending on if the bond is None). These feature vectors are joined together to a single multi-hot feature vector. Only the stereochemistry vector is padded for unknown values.

  • null?

  • bond type

  • conjugated?

  • in ring?

  • stereochemistry

[3]:
featurizer = MultiHotBondFeaturizer()
featurizer(bond_to_featurize)
[3]:
array([0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0])

Custom#

The bond types and stereochemistry can be customized. The defaults are:

  • bond_type

    • Single, Double, Triple, Aromatic

  • stereos

    • 0, 1, 2, 3, 4, 5 - See rdkit.Chem.rdchem.BondStereo for more details

[4]:
from rdkit.Chem.rdchem import BondType

featurizer = MultiHotBondFeaturizer(bond_types=[BondType.SINGLE], stereos=[0, 1, 2])
featurizer(bond_to_featurize)
[4]:
array([0, 1, 0, 0, 1, 0, 0, 0])

Generic#

Any class that has a length and returns a numpy array when given an rdkit.Chem.rdchem.Bond can be used as a bond featurizer.

[5]:
from rdkit.Chem.rdchem import Bond
import numpy as np


class MyBondFeaturizer:
    def __len__(self):
        return 1

    def __call__(self, a: Bond):
        return np.array([a.GetIsConjugated()], dtype=float)


featurizer = MyBondFeaturizer()
featurizer(bond_to_featurize)
[5]:
array([0.])