Models

chemprop.models.py contains the core Chemprop message passing neural network.

Model

chemprop.models.model.py contains the MoleculeModel class, which contains the full Chemprop model. It consists of an MPN, which performs message passing, along with a feed-forward neural network which combines the output of the message passing network along with any additional molecule-level features and makes the final property predictions.

class chemprop.models.model.MoleculeModel(args: TrainArgs)[source]

A MoleculeModel is a model which contains a message passing network following by feed-forward layers.

Parameters:

args – A TrainArgs object containing model arguments.

create_encoder(args: TrainArgs) None[source]

Creates the message passing encoder for the model.

Parameters:

args – A TrainArgs object containing model arguments.

create_ffn(args: TrainArgs) None[source]

Creates the feed-forward layers for the model.

Parameters:

args – A TrainArgs object containing model arguments.

fingerprint(batch: List[List[str]] | List[List[Mol]] | List[List[Tuple[Mol, Mol]]] | List[BatchMolGraph], features_batch: List[ndarray] | None = None, atom_descriptors_batch: List[ndarray] | None = None, atom_features_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None, bond_features_batch: List[ndarray] | None = None, fingerprint_type: str = 'MPN') Tensor[source]

Encodes the latent representations of the input molecules from intermediate stages of the model.

Parameters:
  • batch – A list of list of SMILES, a list of list of RDKit molecules, or a list of BatchMolGraph. The outer list or BatchMolGraph is of length num_molecules (number of datapoints in batch), the inner list is of length number_of_molecules (number of molecules per datapoint).

  • features_batch – A list of numpy arrays containing additional features.

  • atom_descriptors_batch – A list of numpy arrays containing additional atom descriptors.

  • atom_features_batch – A list of numpy arrays containing additional atom features.

  • bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors.

  • bond_features_batch – A list of numpy arrays containing additional bond features.

  • fingerprint_type – The choice of which type of latent representation to return as the molecular fingerprint. Currently supported MPN for the output of the MPNN portion of the model or last_FFN for the input to the final readout layer.

Returns:

The latent fingerprint vectors.

forward(batch: List[List[str]] | List[List[Mol]] | List[List[Tuple[Mol, Mol]]] | List[BatchMolGraph], features_batch: List[ndarray] | None = None, atom_descriptors_batch: List[ndarray] | None = None, atom_features_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None, bond_features_batch: List[ndarray] | None = None, constraints_batch: List[Tensor] | None = None, bond_types_batch: List[Tensor] | None = None) Tensor[source]

Runs the MoleculeModel on input.

Parameters:
  • batch – A list of list of SMILES, a list of list of RDKit molecules, or a list of BatchMolGraph. The outer list or BatchMolGraph is of length num_molecules (number of datapoints in batch), the inner list is of length number_of_molecules (number of molecules per datapoint).

  • features_batch – A list of numpy arrays containing additional features.

  • atom_descriptors_batch – A list of numpy arrays containing additional atom descriptors.

  • atom_features_batch – A list of numpy arrays containing additional atom features.

  • bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors.

  • bond_features_batch – A list of numpy arrays containing additional bond features.

  • constraints_batch – A list of PyTorch tensors which applies constraint on atomic/bond properties.

  • bond_types_batch – A list of PyTorch tensors storing bond types of each bond determined by RDKit molecules.

Returns:

The output of the MoleculeModel, containing a list of property predictions.

MPN

chemprop.models.model.py contains the MPNEncoder class, which is the core message passing network, along with a wrapper MPN which is used within a MoleculeModel.

class chemprop.models.mpn.MPN(args: TrainArgs, atom_fdim: int | None = None, bond_fdim: int | None = None)[source]

An MPN is a wrapper around MPNEncoder which featurizes input as needed.

Parameters:
  • args – A TrainArgs object containing model arguments.

  • atom_fdim – Atom feature vector dimension.

  • bond_fdim – Bond feature vector dimension.

forward(batch: List[List[str]] | List[List[Mol]] | List[List[Tuple[Mol, Mol]]] | List[BatchMolGraph], features_batch: List[ndarray] | None = None, atom_descriptors_batch: List[ndarray] | None = None, atom_features_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None, bond_features_batch: List[ndarray] | None = None) Tensor[source]

Encodes a batch of molecules.

Parameters:
  • batch – A list of list of SMILES, a list of list of RDKit molecules, or a list of BatchMolGraph. The outer list or BatchMolGraph is of length num_molecules (number of datapoints in batch), the inner list is of length number_of_molecules (number of molecules per datapoint).

  • features_batch – A list of numpy arrays containing additional features.

  • atom_descriptors_batch – A list of numpy arrays containing additional atom descriptors.

  • atom_features_batch – A list of numpy arrays containing additional atom features.

  • bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors.

  • bond_features_batch – A list of numpy arrays containing additional bond features.

Returns:

A PyTorch tensor of shape (num_molecules, hidden_size) containing the encoding of each molecule.

class chemprop.models.mpn.MPNEncoder(args: TrainArgs, atom_fdim: int, bond_fdim: int, hidden_size: int | None = None, bias: bool | None = None, depth: int | None = None)[source]

An MPNEncoder is a message passing neural network for encoding a molecule.

Parameters:
  • args – A TrainArgs object containing model arguments.

  • atom_fdim – Atom feature vector dimension.

  • bond_fdim – Bond feature vector dimension.

  • hidden_size – Hidden layers dimension.

  • bias – Whether to add bias to linear layers.

  • depth – Number of message passing steps.

forward(mol_graph: BatchMolGraph, atom_descriptors_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None) Tensor[source]

Encodes a batch of molecular graphs.

Parameters:
  • mol_graph – A BatchMolGraph representing a batch of molecular graphs.

  • atom_descriptors_batch – A list of numpy arrays containing additional atomic descriptors.

  • bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors

Returns:

A PyTorch tensor of shape (num_molecules, hidden_size) containing the encoding of each molecule.