chemprop.uncertainty.calibrator

chemprop.uncertainty.calibrator#

Attributes#

Classes#

CalibratorBase

A base class for calibrating the predicted uncertainties.

RegressionCalibrator

A class for calibrating the predicted uncertainties in regressions tasks.

ZScalingCalibrator

Calibrate regression datasets by applying a scaling value to the uncalibrated standard deviation,

ZelikmanCalibrator

Calibrate regression datasets using a method that does not depend on a particular probability function form.

MVEWeightingCalibrator

Calibrate regression datasets that have ensembles of individual models that make variance predictions.

RegressionConformalCalibrator

Conformalize quantiles to make the interval \([\hat{t}_{\alpha/2}(x),\hat{t}_{1-\alpha/2}(x)]\) to have

BinaryClassificationCalibrator

A class for calibrating the predicted uncertainties in binary classification tasks.

PlattCalibrator

Calibrate classification datasets using the Platt scaling algorithm [guo2017], [platt1999].

IsotonicCalibrator

Calibrate binary classification datasets using isotonic regression as discussed in [guo2017].

MultilabelConformalCalibrator

Creates conformal in-set and conformal out-set such that, for \(1-\alpha\) proportion of datapoints,

MulticlassClassificationCalibrator

A class for calibrating the predicted uncertainties in multiclass classification tasks.

MulticlassConformalCalibrator

Create a prediction sets of possible labels \(C(X_{\text{test}}) \subset \{1 \mathrel{.\,.} K\}\) that follows:

AdaptiveMulticlassConformalCalibrator

Create a prediction sets of possible labels \(C(X_{\text{test}}) \subset \{1 \mathrel{.\,.} K\}\) that follows:

IsotonicMulticlassCalibrator

Calibrate multiclass classification datasets using isotonic regression as discussed in

Module Contents#

chemprop.uncertainty.calibrator.logger#
class chemprop.uncertainty.calibrator.CalibratorBase[source]#

Bases: abc.ABC

A base class for calibrating the predicted uncertainties.

abstractmethod fit(*args, **kwargs)[source]#

Fit calibration method for the calibration data.

Return type:

Self

abstractmethod apply(uncs)[source]#

Apply this calibrator to the input uncertainties.

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties

Returns:

the calibrated uncertainties

Return type:

Tensor

chemprop.uncertainty.calibrator.UncertaintyCalibratorRegistry#
class chemprop.uncertainty.calibrator.RegressionCalibrator[source]#

Bases: CalibratorBase

A class for calibrating the predicted uncertainties in regressions tasks.

abstractmethod fit(preds, uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

RegressionCalibrator

class chemprop.uncertainty.calibrator.ZScalingCalibrator[source]#

Bases: RegressionCalibrator

Calibrate regression datasets by applying a scaling value to the uncalibrated standard deviation, fitted by minimizing the negative-log-likelihood of a normal distribution around each prediction. [levi2022]

References

[levi2022]

Levi, D.; Gispan, L.; Giladi, N.; Fetaya, E. “Evaluating and Calibrating Uncertainty Prediction in Regression Tasks.” Sensors, 2022, 22(15), 5540. https://www.mdpi.com/1424-8220/22/15/5540

fit(preds, uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

RegressionCalibrator

apply(uncs)[source]#

Apply this calibrator to the input uncertainties.

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties

Returns:

the calibrated uncertainties

Return type:

Tensor

class chemprop.uncertainty.calibrator.ZelikmanCalibrator(p)[source]#

Bases: RegressionCalibrator

Calibrate regression datasets using a method that does not depend on a particular probability function form.

It uses the “CRUDE” method as described in [zelikman2020]. We implemented this method to be used with variance as the uncertainty.

Parameters:

p (float) – The target qunatile, \(p \in [0, 1]\)

References

[zelikman2020]

Zelikman, E.; Healy, C.; Zhou, S.; Avati, A. “CRUDE: calibrating regression uncertainty distributions empirically.” arXiv preprint arXiv:2005.12496. https://doi.org/10.48550/arXiv.2005.12496

p#
fit(preds, uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

RegressionCalibrator

apply(uncs)[source]#

Apply this calibrator to the input uncertainties.

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties

Returns:

the calibrated uncertainties

Return type:

Tensor

class chemprop.uncertainty.calibrator.MVEWeightingCalibrator[source]#

Bases: RegressionCalibrator

Calibrate regression datasets that have ensembles of individual models that make variance predictions.

This method minimizes the negative log likelihood for the predictions versus the targets by applying a weighted average across the variance predictions of the ensemble. [wang2021]

References

[wang2021]

Wang, D.; Yu, J.; Chen, L.; Li, X.; Jiang, H.; Chen, K.; Zheng, M.; Luo, X. “A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling.” J. Cheminform., 2021, 13, 1-17. https://doi.org/10.1186/s13321-021-00551-x

fit(preds, uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties of the shape of m x n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

MVEWeightingCalibrator

apply(uncs)[source]#

Apply this calibrator to the input uncertainties.

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties of the shape of m x n x t

Returns:

the calibrated uncertainties of the shape of n x t

Return type:

Tensor

class chemprop.uncertainty.calibrator.RegressionConformalCalibrator(alpha)[source]#

Bases: RegressionCalibrator

Conformalize quantiles to make the interval \([\hat{t}_{\alpha/2}(x),\hat{t}_{1-\alpha/2}(x)]\) to have approximately \(1-\alpha\) coverage. [angelopoulos2021]

\[ \begin{align}\begin{aligned}s(x, y) &= \max \left\{ \hat{t}_{\alpha/2}(x) - y, y - \hat{t}_{1-\alpha/2}(x) \right\}\\\hat{q} &= Q(s_1, \ldots, s_n; \left\lceil \frac{(n+1)(1-\alpha)}{n} \right\rceil)\\C(x) &= \left[ \hat{t}_{\alpha/2}(x) - \hat{q}, \hat{t}_{1-\alpha/2}(x) + \hat{q} \right]\end{aligned}\end{align} \]

where \(s\) is the nonconformity score as the difference between \(y\) and its nearest quantile. \(\hat{t}_{\alpha/2}(x)\) and \(\hat{t}_{1-\alpha/2}(x)\) are the predicted quantiles from a quantile regression model.

Note

The algorithm is specifically designed for quantile regression model. Intuitively, the set \(C(x)\) just grows or shrinks the distance between the quantiles by \(\hat{q}\) to achieve coverage. However, this function can also be applied to regression model without quantiles being provided. In this case, both \(\hat{t}_{\alpha/2}(x)\) and \(\hat{t}_{1-\alpha/2}(x)\) are the same as \(\hat{y}\). Then, the interval would be the same for every data point (i.e., \(\left[-\hat{q}, \hat{q} \right]\)).

Parameters:

alpha (float) – The error rate, \(\alpha \in [0, 1]\)

References

[angelopoulos2021]

Angelopoulos, A.N.; Bates, S.; “A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.” arXiv Preprint 2021, https://arxiv.org/abs/2107.07511

alpha#
bounds#
fit(preds, uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

RegressionCalibrator

apply(uncs)[source]#

Apply this calibrator to the input uncertainties (half intervals).

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties

Returns:

the calibrated half intervals

Return type:

Tensor

class chemprop.uncertainty.calibrator.BinaryClassificationCalibrator[source]#

Bases: CalibratorBase

A class for calibrating the predicted uncertainties in binary classification tasks.

abstractmethod fit(uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probability of class 1) of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

BinaryClassificationCalibrator

class chemprop.uncertainty.calibrator.PlattCalibrator[source]#

Bases: BinaryClassificationCalibrator

Calibrate classification datasets using the Platt scaling algorithm [guo2017], [platt1999].

In [platt1999], Platt suggests using the number of positive and negative training examples to adjust the value of target probabilities used to fit the parameters.

References

[guo2017]

Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K. Q. “On calibration of modern neural networks”. ICML, 2017. https://arxiv.org/abs/1706.04599

[platt1999] (1,2,3)

Platt, J. “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods.” Adv. Large Margin Classif. 1999, 10 (3), 61–74.

fit(uncs, targets, mask, training_targets=None)[source]#

Fit calibration method for the calibration data.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probability of class 1) of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

  • training_targets (torch.Tensor | None)

Returns:

self – the fitted calibrator

Return type:

BinaryClassificationCalibrator

apply(uncs)[source]#

Apply this calibrator to the input uncertainties.

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties

Returns:

the calibrated uncertainties

Return type:

Tensor

class chemprop.uncertainty.calibrator.IsotonicCalibrator[source]#

Bases: BinaryClassificationCalibrator

Calibrate binary classification datasets using isotonic regression as discussed in [guo2017]. In effect, the method transforms incoming uncalibrated confidences using a histogram-like function where the range of each transforming bin and its magnitude is learned.

References

[guo2017]

Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K. Q. “On calibration of modern neural networks”. ICML, 2017. https://arxiv.org/abs/1706.04599

fit(uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probability of class 1) of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

BinaryClassificationCalibrator

apply(uncs)[source]#

Apply this calibrator to the input uncertainties.

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties

Returns:

the calibrated uncertainties

Return type:

Tensor

class chemprop.uncertainty.calibrator.MultilabelConformalCalibrator(alpha)[source]#

Bases: BinaryClassificationCalibrator

Creates conformal in-set and conformal out-set such that, for \(1-\alpha\) proportion of datapoints, the set of labels is bounded by the in- and out-sets [1]_:

\[\Pr \left( \hat{\mathcal C}_{\text{in}}(X) \subseteq \mathcal Y \subseteq \hat{\mathcal C}_{\text{out}}(X) \right) \geq 1 - \alpha,\]

where the in-set \(\hat{\mathcal C}_\text{in}\) is contained by the set of true labels \(\mathcal Y\) and \(\mathcal Y\) is contained within the out-set \(\hat{\mathcal C}_\text{out}\).

Parameters:

alpha (float) – The error rate, \(\alpha \in [0, 1]\)

References

alpha#
static nonconformity_scores(preds)[source]#

Compute nonconformity score as the negative of the predicted probability.

\[s_i = -\hat{f}(X_i)_{Y_i}\]
Parameters:

preds (torch.Tensor)

fit(uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probability of class 1) of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

BinaryClassificationCalibrator

apply(uncs)[source]#

Apply this calibrator to the input uncertainties.

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties

Returns:

the calibrated uncertainties of the shape of n x t x 2, where n is the number of input molecules/reactions, t is the number of tasks, and the first element in the last dimension corresponds to the in-set \(\hat{\mathcal C}_\text{in}\), while the second corresponds to the out-set \(\hat{\mathcal C}_\text{out}\).

Return type:

Tensor

class chemprop.uncertainty.calibrator.MulticlassClassificationCalibrator[source]#

Bases: CalibratorBase

A class for calibrating the predicted uncertainties in multiclass classification tasks.

abstractmethod fit(uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probabilities for each class) of the shape of n x t x c, where n is the number of input molecules/reactions, t is the number of tasks, and c is the number of classes.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

MulticlassClassificationCalibrator

class chemprop.uncertainty.calibrator.MulticlassConformalCalibrator(alpha)[source]#

Bases: MulticlassClassificationCalibrator

Create a prediction sets of possible labels \(C(X_{\text{test}}) \subset \{1 \mathrel{.\,.} K\}\) that follows:

\[1 - \alpha \leq \Pr (Y_{\text{test}} \in C(X_{\text{test}})) \leq 1 - \alpha + \frac{1}{n + 1}\]

In other words, the probability that the prediction set contains the correct label is almost exactly \(1-\alpha\). More detailes can be found in [1]_.

Parameters:

alpha (float) – Error rate, \(\alpha \in [0, 1]\)

References

alpha#
static nonconformity_scores(preds)[source]#

Compute nonconformity score as the negative of the softmax output for the true class.

\[s_i = -\hat{f}(X_i)_{Y_i}\]
Parameters:

preds (torch.Tensor)

fit(uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probabilities for each class) of the shape of n x t x c, where n is the number of input molecules/reactions, t is the number of tasks, and c is the number of classes.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

MulticlassClassificationCalibrator

apply(uncs)[source]#

Apply this calibrator to the input uncertainties.

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties

Returns:

the calibrated uncertainties

Return type:

Tensor

class chemprop.uncertainty.calibrator.AdaptiveMulticlassConformalCalibrator(alpha)[source]#

Bases: MulticlassConformalCalibrator

Create a prediction sets of possible labels \(C(X_{\text{test}}) \subset \{1 \mathrel{.\,.} K\}\) that follows:

\[1 - \alpha \leq \Pr (Y_{\text{test}} \in C(X_{\text{test}})) \leq 1 - \alpha + \frac{1}{n + 1}\]

In other words, the probability that the prediction set contains the correct label is almost exactly \(1-\alpha\). More detailes can be found in [1]_.

Parameters:

alpha (float) – Error rate, \(\alpha \in [0, 1]\)

References

static nonconformity_scores(preds)[source]#

Compute nonconformity score by greedily including classes in the classification set until it reaches the true label.

\[s(x, y) = \sum_{j=1}^{k} \hat{f}(x)_{\pi_j(x)}, \text{ where } y = \pi_k(x)\]

where \(\pi_k(x)\) is the permutation of \(\{1 \mathrel{.\,.} K\}\) that sorts \(\hat{f}(X_{test})\) from most likely to least likely.

class chemprop.uncertainty.calibrator.IsotonicMulticlassCalibrator[source]#

Bases: MulticlassClassificationCalibrator

Calibrate multiclass classification datasets using isotonic regression as discussed in [guo2017]. It uses a one-vs-all aggregation scheme to extend isotonic regression from binary to multiclass classifiers.

References

[guo2017]

Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K. Q. “On calibration of modern neural networks”. ICML, 2017. https://arxiv.org/abs/1706.04599

fit(uncs, targets, mask)[source]#

Fit calibration method for the calibration data.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probabilities for each class) of the shape of n x t x c, where n is the number of input molecules/reactions, t is the number of tasks, and c is the number of classes.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the fitting

Returns:

self – the fitted calibrator

Return type:

MulticlassClassificationCalibrator

apply(uncs)[source]#

Apply this calibrator to the input uncertainties.

Parameters:

uncs (Tensor) – a tensor containinig uncalibrated uncertainties

Returns:

the calibrated uncertainties

Return type:

Tensor