chemprop.data.datapoints#

Attributes#

Classes#

MoleculeDatapoint

A MoleculeDatapoint contains a single molecule and its associated features and targets.

LazyMoleculeDatapoint

A LazyMoleculeDatapoint contains a single SMILES string, and all attributes need to

MolAtomBondDatapoint

A MoleculeDatapoint contains a single molecule and its associated features and targets.

ReactionDatapoint

A ReactionDatapoint contains a single reaction and its associated features and targets.

Module Contents#

chemprop.data.datapoints.MoleculeFeaturizer#
class chemprop.data.datapoints.MoleculeDatapoint[source]#

Bases: _DatapointMixin, _MoleculeDatapointMixin

A MoleculeDatapoint contains a single molecule and its associated features and targets.

V_f: numpy.ndarray | None = None#

A numpy array of shape V x d_vf, where V is the number of atoms in the molecule, and d_vf is the number of additional features that will be concatenated to atom-level features before message passing

E_f: numpy.ndarray | None = None#

A numpy array of shape E x d_ef, where E is the number of bonds in the molecule, and d_ef is the number of additional features containing additional features that will be concatenated to bond-level features before message passing

V_d: numpy.ndarray | None = None#

A numpy array of shape V x d_vd, where V is the number of atoms in the molecule, and d_vd is the number of additional descriptors that will be concatenated to atom-level descriptors after message passing

__post_init__()[source]#
__len__()[source]#
Return type:

int

class chemprop.data.datapoints.LazyMoleculeDatapoint[source]#

Bases: _DatapointMixin, _LazyMoleculeDatapointMixin

A LazyMoleculeDatapoint contains a single SMILES string, and all attributes need to form a rdkit.Chem.Mol object. The molecule is computed lazily when the attribute mol is accessed.

V_f: numpy.ndarray | None = None#

A numpy array of shape V x d_vf, where V is the number of atoms in the molecule, and d_vf is the number of additional features that will be concatenated to atom-level features before message passing

E_f: numpy.ndarray | None = None#

A numpy array of shape E x d_ef, where E is the number of bonds in the molecule, and d_ef is the number of additional features containing additional features that will be concatenated to bond-level features before message passing

V_d: numpy.ndarray | None = None#

A numpy array of shape V x d_vd, where V is the number of atoms in the molecule, and d_vd is the number of additional descriptors that will be concatenated to atom-level descriptors after message passing

__post_init__()[source]#
__len__()[source]#
Return type:

int

class chemprop.data.datapoints.MolAtomBondDatapoint[source]#

Bases: MoleculeDatapoint

A MoleculeDatapoint contains a single molecule and its associated features and targets.

E_d: numpy.ndarray | None = None#

A numpy array of shape E x d_ed, where E is the number of bonds in the molecule, and d_ed is the number of additional descriptors that will be concatenated to edge-level descriptors after message passing

atom_y: numpy.ndarray | None = None#

A numpy array of shape V x v_t, where V is the number of atoms in the molecule, and v_t is the number of atom targets. The order of atoms in the array should match the order of atoms in the mol. Unknown targets are indicated by `nan`s.

atom_gt_mask: numpy.ndarray | None = None#

Indicates whether the atom targets are an inequality regression target of the form <x

atom_lt_mask: numpy.ndarray | None = None#

Indicates whether the atom targets are an inequality regression target of the form >x

bond_y: numpy.ndarray | None = None#

A numpy array of shape E x e_t, where V is the number of bonds in the molecule, and e_t is the number of bond targets. The order of bonds in the array should match the order of bonds in the mol. Unknown targets are indicated by `nan`s.

bond_gt_mask: numpy.ndarray | None = None#

Indicates whether the bond targets are an inequality regression target of the form <x

bond_lt_mask: numpy.ndarray | None = None#

Indicates whether the bond targets are an inequality regression target of the form >x

atom_constraint: numpy.ndarray | None = None#

A numpy array of shape 1 x v_t containing the values that the atom property predictions should be constrained to sum to, with np.nan indicating no constraint for that property

bond_constraint: numpy.ndarray | None = None#

A numpy array of shape 1 x e_t containing the values that the bond property predictions should be constrained to sum to, with np.nan indicating no constraint for that property

__post_init__()[source]#
classmethod from_smi(smi, *args, keep_h=False, add_h=False, ignore_stereo=False, reorder_atoms=True, **kwargs)[source]#
Parameters:
  • smi (str)

  • keep_h (bool)

  • add_h (bool)

  • ignore_stereo (bool)

  • reorder_atoms (bool)

Return type:

MolAtomBondDatapoint

class chemprop.data.datapoints.ReactionDatapoint[source]#

Bases: _DatapointMixin, _ReactionDatapointMixin

A ReactionDatapoint contains a single reaction and its associated features and targets.

__post_init__()[source]#
__len__()[source]#
Return type:

int