Models

chemprop.models.py contains the core Chemprop message passing neural network.

Model

chemprop.models.model.py contains the MoleculeModel class, which contains the full Chemprop model. It consists of an MPN, which performs message passing, along with a feed-forward neural network which combines the output of the message passing network along with any additional molecule-level features and makes the final property predictions.

class chemprop.models.model.MoleculeModel(args: TrainArgs)[source]

A MoleculeModel is a model which contains a message passing network following by feed-forward layers.

Parameters:: args – A TrainArgs object containing model arguments.

create_encoder(args: TrainArgs) → None[source]

Creates the message passing encoder for the model.

Parameters:: args – A TrainArgs object containing model arguments.

create_ffn(args: TrainArgs) → None[source]

Creates the feed-forward layers for the model.

Parameters:: args – A TrainArgs object containing model arguments.

fingerprint(batch: List[List[str]] | List[List[Mol]] | List[List[Tuple[Mol, Mol]]] | List[BatchMolGraph], features_batch: List[ndarray] | None = None, atom_descriptors_batch: List[ndarray] | None = None, atom_features_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None, bond_features_batch: List[ndarray] | None = None, fingerprint_type: str = 'MPN') → Tensor[source]

Encodes the latent representations of the input molecules from intermediate stages of the model.

Parameters:

batch – A list of list of SMILES, a list of list of RDKit molecules, or a list of BatchMolGraph. The outer list or BatchMolGraph is of length num_molecules (number of datapoints in batch), the inner list is of length number_of_molecules (number of molecules per datapoint).
features_batch – A list of numpy arrays containing additional features.
atom_descriptors_batch – A list of numpy arrays containing additional atom descriptors.
atom_features_batch – A list of numpy arrays containing additional atom features.
bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors.
bond_features_batch – A list of numpy arrays containing additional bond features.
fingerprint_type – The choice of which type of latent representation to return as the molecular fingerprint. Currently supported MPN for the output of the MPNN portion of the model or last_FFN for the input to the final readout layer.

Returns:

The latent fingerprint vectors.

forward(batch: List[List[str]] | List[List[Mol]] | List[List[Tuple[Mol, Mol]]] | List[BatchMolGraph], features_batch: List[ndarray] | None = None, atom_descriptors_batch: List[ndarray] | None = None, atom_features_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None, bond_features_batch: List[ndarray] | None = None, constraints_batch: List[Tensor] | None = None, bond_types_batch: List[Tensor] | None = None) → Tensor[source]

Runs the MoleculeModel on input.

Parameters:

batch – A list of list of SMILES, a list of list of RDKit molecules, or a list of BatchMolGraph. The outer list or BatchMolGraph is of length num_molecules (number of datapoints in batch), the inner list is of length number_of_molecules (number of molecules per datapoint).
features_batch – A list of numpy arrays containing additional features.
atom_descriptors_batch – A list of numpy arrays containing additional atom descriptors.
atom_features_batch – A list of numpy arrays containing additional atom features.
bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors.
bond_features_batch – A list of numpy arrays containing additional bond features.
constraints_batch – A list of PyTorch tensors which applies constraint on atomic/bond properties.
bond_types_batch – A list of PyTorch tensors storing bond types of each bond determined by RDKit molecules.

Returns:

The output of the MoleculeModel, containing a list of property predictions.

MPN

chemprop.models.model.py contains the MPNEncoder class, which is the core message passing network, along with a wrapper MPN which is used within a MoleculeModel.

class chemprop.models.mpn.MPN(args: TrainArgs, atom_fdim: int | None = None, bond_fdim: int | None = None)[source]

An MPN is a wrapper around MPNEncoder which featurizes input as needed.

Parameters:

args – A TrainArgs object containing model arguments.
atom_fdim – Atom feature vector dimension.
bond_fdim – Bond feature vector dimension.

Encodes a batch of molecules.

Parameters:

batch – A list of list of SMILES, a list of list of RDKit molecules, or a list of BatchMolGraph. The outer list or BatchMolGraph is of length num_molecules (number of datapoints in batch), the inner list is of length number_of_molecules (number of molecules per datapoint).
features_batch – A list of numpy arrays containing additional features.
atom_descriptors_batch – A list of numpy arrays containing additional atom descriptors.
atom_features_batch – A list of numpy arrays containing additional atom features.
bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors.
bond_features_batch – A list of numpy arrays containing additional bond features.

Returns:

A PyTorch tensor of shape (num_molecules, hidden_size) containing the encoding of each molecule.

class chemprop.models.mpn.MPNEncoder(args: TrainArgs, atom_fdim: int, bond_fdim: int, hidden_size: int | None = None, bias: bool | None = None, depth: int | None = None)[source]

An MPNEncoder is a message passing neural network for encoding a molecule.

Parameters:

args – A TrainArgs object containing model arguments.
atom_fdim – Atom feature vector dimension.
bond_fdim – Bond feature vector dimension.
hidden_size – Hidden layers dimension.
bias – Whether to add bias to linear layers.
depth – Number of message passing steps.

forward(mol_graph: BatchMolGraph, atom_descriptors_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None) → Tensor[source]

Encodes a batch of molecular graphs.

Parameters:

mol_graph – A BatchMolGraph representing a batch of molecular graphs.
atom_descriptors_batch – A list of numpy arrays containing additional atomic descriptors.
bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors

Returns:

A PyTorch tensor of shape (num_molecules, hidden_size) containing the encoding of each molecule.