Models
chemprop.models.py contains the core Chemprop message passing neural network.
Model
chemprop.models.model.py contains the MoleculeModel
class, which contains the full Chemprop model. It consists of an MPN
, which performs message passing, along with a feed-forward neural network which combines the output of the message passing network along with any additional molecule-level features and makes the final property predictions.
- class chemprop.models.model.MoleculeModel(args: TrainArgs)[source]
A
MoleculeModel
is a model which contains a message passing network following by feed-forward layers.- Parameters:
args – A
TrainArgs
object containing model arguments.
- create_encoder(args: TrainArgs) None [source]
Creates the message passing encoder for the model.
- Parameters:
args – A
TrainArgs
object containing model arguments.
- create_ffn(args: TrainArgs) None [source]
Creates the feed-forward layers for the model.
- Parameters:
args – A
TrainArgs
object containing model arguments.
- fingerprint(batch: List[List[str]] | List[List[Mol]] | List[List[Tuple[Mol, Mol]]] | List[BatchMolGraph], features_batch: List[ndarray] | None = None, atom_descriptors_batch: List[ndarray] | None = None, atom_features_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None, bond_features_batch: List[ndarray] | None = None, fingerprint_type: str = 'MPN') Tensor [source]
Encodes the latent representations of the input molecules from intermediate stages of the model.
- Parameters:
batch – A list of list of SMILES, a list of list of RDKit molecules, or a list of
BatchMolGraph
. The outer list or BatchMolGraph is of lengthnum_molecules
(number of datapoints in batch), the inner list is of lengthnumber_of_molecules
(number of molecules per datapoint).features_batch – A list of numpy arrays containing additional features.
atom_descriptors_batch – A list of numpy arrays containing additional atom descriptors.
atom_features_batch – A list of numpy arrays containing additional atom features.
bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors.
bond_features_batch – A list of numpy arrays containing additional bond features.
fingerprint_type – The choice of which type of latent representation to return as the molecular fingerprint. Currently supported MPN for the output of the MPNN portion of the model or last_FFN for the input to the final readout layer.
- Returns:
The latent fingerprint vectors.
- forward(batch: List[List[str]] | List[List[Mol]] | List[List[Tuple[Mol, Mol]]] | List[BatchMolGraph], features_batch: List[ndarray] | None = None, atom_descriptors_batch: List[ndarray] | None = None, atom_features_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None, bond_features_batch: List[ndarray] | None = None, constraints_batch: List[Tensor] | None = None, bond_types_batch: List[Tensor] | None = None) Tensor [source]
Runs the
MoleculeModel
on input.- Parameters:
batch – A list of list of SMILES, a list of list of RDKit molecules, or a list of
BatchMolGraph
. The outer list or BatchMolGraph is of lengthnum_molecules
(number of datapoints in batch), the inner list is of lengthnumber_of_molecules
(number of molecules per datapoint).features_batch – A list of numpy arrays containing additional features.
atom_descriptors_batch – A list of numpy arrays containing additional atom descriptors.
atom_features_batch – A list of numpy arrays containing additional atom features.
bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors.
bond_features_batch – A list of numpy arrays containing additional bond features.
constraints_batch – A list of PyTorch tensors which applies constraint on atomic/bond properties.
bond_types_batch – A list of PyTorch tensors storing bond types of each bond determined by RDKit molecules.
- Returns:
The output of the
MoleculeModel
, containing a list of property predictions.
MPN
chemprop.models.model.py contains the MPNEncoder
class, which is the core message passing network, along with a wrapper MPN
which is used within a MoleculeModel
.
- class chemprop.models.mpn.MPN(args: TrainArgs, atom_fdim: int | None = None, bond_fdim: int | None = None)[source]
An
MPN
is a wrapper aroundMPNEncoder
which featurizes input as needed.- Parameters:
args – A
TrainArgs
object containing model arguments.atom_fdim – Atom feature vector dimension.
bond_fdim – Bond feature vector dimension.
- forward(batch: List[List[str]] | List[List[Mol]] | List[List[Tuple[Mol, Mol]]] | List[BatchMolGraph], features_batch: List[ndarray] | None = None, atom_descriptors_batch: List[ndarray] | None = None, atom_features_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None, bond_features_batch: List[ndarray] | None = None) Tensor [source]
Encodes a batch of molecules.
- Parameters:
batch – A list of list of SMILES, a list of list of RDKit molecules, or a list of
BatchMolGraph
. The outer list or BatchMolGraph is of lengthnum_molecules
(number of datapoints in batch), the inner list is of lengthnumber_of_molecules
(number of molecules per datapoint).features_batch – A list of numpy arrays containing additional features.
atom_descriptors_batch – A list of numpy arrays containing additional atom descriptors.
atom_features_batch – A list of numpy arrays containing additional atom features.
bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors.
bond_features_batch – A list of numpy arrays containing additional bond features.
- Returns:
A PyTorch tensor of shape
(num_molecules, hidden_size)
containing the encoding of each molecule.
- class chemprop.models.mpn.MPNEncoder(args: TrainArgs, atom_fdim: int, bond_fdim: int, hidden_size: int | None = None, bias: bool | None = None, depth: int | None = None)[source]
An
MPNEncoder
is a message passing neural network for encoding a molecule.- Parameters:
args – A
TrainArgs
object containing model arguments.atom_fdim – Atom feature vector dimension.
bond_fdim – Bond feature vector dimension.
hidden_size – Hidden layers dimension.
bias – Whether to add bias to linear layers.
depth – Number of message passing steps.
- forward(mol_graph: BatchMolGraph, atom_descriptors_batch: List[ndarray] | None = None, bond_descriptors_batch: List[ndarray] | None = None) Tensor [source]
Encodes a batch of molecular graphs.
- Parameters:
mol_graph – A
BatchMolGraph
representing a batch of molecular graphs.atom_descriptors_batch – A list of numpy arrays containing additional atomic descriptors.
bond_descriptors_batch – A list of numpy arrays containing additional bond descriptors
- Returns:
A PyTorch tensor of shape
(num_molecules, hidden_size)
containing the encoding of each molecule.