Predictors#
[1]:
import torch
from chemprop.nn.predictors import (
RegressionFFN,
BinaryClassificationFFN,
MulticlassClassificationFFN,
)
This is example output of aggregation for input to the predictor.
[2]:
n_datapoints_in_batch = 2
hidden_dim = 300
example_aggregation_output = torch.randn(n_datapoints_in_batch, hidden_dim)
Feed forward network#
The learned representation from message passing and aggregation is a vector like that of fixed representations. While other predictors like random forest could be used to make final predictions from this representation, Chemprop prefers and implements using a feed forward network as that allows for end-to-end training. Three basic Chemprop FFNs differ in the prediction task they are used for. Note that multiclass classification needs to know the number of classes.
[3]:
regression_ffn = RegressionFFN()
binary_class_ffn = BinaryClassificationFFN()
multi_class_ffn = MulticlassClassificationFFN(n_classes=3)
Input dimension#
The default input dimension of the predictor is the same as the default dimension of the message passing hidden representation. If your message passing hidden dimension is different, or if you have addition atom or datapoint descriptors, you need to change the predictor’s input dimension.
[4]:
ffn = RegressionFFN()
ffn(example_aggregation_output)
[4]:
tensor([[0.2080],
[0.2787]], grad_fn=<AddmmBackward0>)
[5]:
mp_hidden_dim = 2
n_atom_descriptors = 1
mp_output = torch.randn(n_datapoints_in_batch, mp_hidden_dim + n_atom_descriptors)
example_datapoint_descriptors = torch.randn(n_datapoints_in_batch, 12)
input_dim = mp_output.shape[1] + example_datapoint_descriptors.shape[1]
ffn = RegressionFFN(input_dim=input_dim)
ffn(torch.cat([mp_output, example_datapoint_descriptors], dim=1))
[5]:
tensor([[-0.0877],
[-0.2629]], grad_fn=<AddmmBackward0>)
Output dimension#
The number of tasks defaults to 1 but can be adjusted. Predictors that need to predict multiple values per task, like multiclass classification, will automatically adjust the output dimension.
[6]:
ffn = RegressionFFN(n_tasks=4)
ffn(example_aggregation_output).shape
[6]:
torch.Size([2, 4])
[7]:
ffn = MulticlassClassificationFFN(n_tasks=4, n_classes=3)
ffn(example_aggregation_output).shape
[7]:
torch.Size([2, 4, 3])
Customization#
The following hyperparameters of the predictor are customizable:
the hidden dimension between layer, default: 300
the number of layer, default 1
the dropout probability, default: 0.0 (i.e. no dropout)
which activation function, default: ReLU
[8]:
custom_ffn = RegressionFFN(hidden_dim=600, n_layers=3, dropout=0.1, activation="tanh")
custom_ffn(example_aggregation_output)
[8]:
tensor([[ 0.0121],
[-0.0760]], grad_fn=<AddmmBackward0>)
Intermediate hidden representations can also be extracted. Note that each predictor layer consists of an activation layer, followed by dropout, followed by a linear layer. The first predictor layer only has the linear layer.
[ ]:
layer = 2
custom_ffn.encode(example_aggregation_output, i=layer).shape
torch.Size([2, 600])
[ ]:
custom_ffn
RegressionFFN(
(ffn): MLP(
(0): Sequential(
(0): Linear(in_features=300, out_features=600, bias=True)
)
(1): Sequential(
(0): Tanh()
(1): Dropout(p=0.1, inplace=False)
(2): Linear(in_features=600, out_features=600, bias=True)
)
(2): Sequential(
(0): Tanh()
(1): Dropout(p=0.1, inplace=False)
(2): Linear(in_features=600, out_features=600, bias=True)
)
(3): Sequential(
(0): Tanh()
(1): Dropout(p=0.1, inplace=False)
(2): Linear(in_features=600, out_features=1, bias=True)
)
)
(criterion): MSE(task_weights=[[1.0]])
(output_transform): Identity()
)
Criterion#
Each predictor has a criterion that is used as the loss function during training. The default criterion for a predictor is defined in the predictor class.
[11]:
print(RegressionFFN._T_default_criterion)
print(BinaryClassificationFFN._T_default_criterion)
print(MulticlassClassificationFFN._T_default_criterion)
<class 'chemprop.nn.metrics.MSE'>
<class 'chemprop.nn.metrics.BCELoss'>
<class 'chemprop.nn.metrics.CrossEntropyLoss'>
A custom criterion can be given to the predictor.
[12]:
from chemprop.nn import MSE
criterion = MSE(task_weights=torch.tensor([0.5, 1.0]))
ffn = RegressionFFN(n_tasks=2, criterion=criterion)
Regression vs. classification#
In addition to using different loss functions, regression and classification predictors also differ in their tranforms of the model outputs during inference.
Regression should use a scaler transform if target normalization is used during training.
Classification uses a sigmoid (for binary classification) or a softmax (for multiclass) transform to keep class probability predictions between 0 and 1.
[ ]:
probs = binary_class_ffn(example_aggregation_output)
(0 < probs).all() and (probs < 1).all()
tensor(True)
Other predictors coming soon#
Beta versions of predictors for uncertainty and spectral tasks will be finalized in v2.1.
[14]:
from chemprop.nn.predictors import (
MveFFN,
EvidentialFFN,
BinaryDirichletFFN,
MulticlassDirichletFFN,
SpectralFFN,
)