chemprop.uncertainty.evaluator#

Attributes#

Classes#

RegressionEvaluator

Evaluates the quality of uncertainty estimates in regression tasks.

NLLRegressionEvaluator

Evaluate uncertainty values for regression datasets using the mean negative-log-likelihood

CalibrationAreaEvaluator

A class for evaluating regression uncertainty values based on how they deviate from perfect

ExpectedNormalizedErrorEvaluator

A class that evaluates uncertainty performance by binning together clusters of predictions

SpearmanEvaluator

Evaluate the Spearman rank correlation coefficient between the uncertainties and errors in the model predictions.

RegressionConformalEvaluator

Evaluate the coverage of conformal prediction for regression datasets.

BinaryClassificationEvaluator

Evaluates the quality of uncertainty estimates in binary classification tasks.

NLLClassEvaluator

Evaluate uncertainty values for binary classification datasets using the mean negative-log-likelihood

MultilabelConformalEvaluator

Evaluate the coverage of conformal prediction for binary classification datasets with multiple labels.

MulticlassClassificationEvaluator

Evaluates the quality of uncertainty estimates in multiclass classification tasks.

NLLMulticlassEvaluator

Evaluate uncertainty values for multiclass classification datasets using the mean negative-log-likelihood

MulticlassConformalEvaluator

Evaluate the coverage of conformal prediction for multiclass classification datasets.

Module Contents#

chemprop.uncertainty.evaluator.UncertaintyEvaluatorRegistry#
class chemprop.uncertainty.evaluator.RegressionEvaluator[source]#

Bases: abc.ABC

Evaluates the quality of uncertainty estimates in regression tasks.

abstractmethod evaluate(preds, uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.NLLRegressionEvaluator[source]#

Bases: RegressionEvaluator

Evaluate uncertainty values for regression datasets using the mean negative-log-likelihood of the targets given the probability distributions estimated by the model:

\[\mathrm{NLL}(y, \hat y) = \frac{1}{2} \log(2 \pi \sigma^2) + \frac{(y - \hat{y})^2}{2 \sigma^2}\]

where \(\hat{y}\) is the predicted value, \(y\) is the true value, and \(\sigma^2\) is the predicted uncertainty (variance).

The function returns a tensor containing the mean NLL for each task.

evaluate(preds, uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.CalibrationAreaEvaluator[source]#

Bases: RegressionEvaluator

A class for evaluating regression uncertainty values based on how they deviate from perfect calibration on an observed-probability versus expected-probability plot.

evaluate(preds, uncs, targets, mask, num_bins=100)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties (variance) of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

  • num_bins (int, default=100) – the number of bins to discretize the [0, 1] interval

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.ExpectedNormalizedErrorEvaluator[source]#

Bases: RegressionEvaluator

A class that evaluates uncertainty performance by binning together clusters of predictions and comparing the average predicted variance of the clusters against the RMSE of the cluster. [1]

\[\mathrm{ENCE} = \frac{1}{N} \sum_{i=1}^{N} \frac{|\mathrm{RMV}_i - \mathrm{RMSE}_i|}{\mathrm{RMV}_i}\]

where \(N\) is the number of bins, \(\mathrm{RMV}_i\) is the root of the mean uncertainty over the \(i\)-th bin and \(\mathrm{RMSE}_i\) is the root mean square error over the \(i\)-th bin. This discrepancy is further normalized by the uncertainty over the bin, \(\mathrm{RMV}_i\), because the error is expected to be naturally higher as the uncertainty increases.

References

evaluate(preds, uncs, targets, mask, num_bins=100)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties (variance) of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

  • num_bins (int, default=100) – the number of bins the data are divided into

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.SpearmanEvaluator[source]#

Bases: RegressionEvaluator

Evaluate the Spearman rank correlation coefficient between the uncertainties and errors in the model predictions.

The correlation coefficient returns a value in the [-1, 1] range, with better scores closer to 1 observed when the uncertainty values are predictive of the rank ordering of the errors in the model prediction.

evaluate(preds, uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.RegressionConformalEvaluator[source]#

Bases: RegressionEvaluator

Evaluate the coverage of conformal prediction for regression datasets.

\[\Pr (Y_{\text{test}} \in C(X_{\text{test}}))\]

where the \(C(X_{\text{test}})\) is the predicted interval.

evaluate(preds, uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • preds (Tensor) – the predictions for regression tasks. It is a tensor of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • uncs (Tensor) – the predicted uncertainties of the shape of n x t

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.BinaryClassificationEvaluator[source]#

Bases: abc.ABC

Evaluates the quality of uncertainty estimates in binary classification tasks.

abstractmethod evaluate(uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probability of class 1) of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.NLLClassEvaluator[source]#

Bases: BinaryClassificationEvaluator

Evaluate uncertainty values for binary classification datasets using the mean negative-log-likelihood of the targets given the assigned probabilities from the model:

\[\mathrm{NLL} = -\log(\hat{y} \cdot y + (1 - \hat{y}) \cdot (1 - y))\]

where \(y\) is the true binary label (0 or 1), and \(\hat{y}\) is the predicted probability associated with the class label 1.

The function returns a tensor containing the mean NLL for each task.

evaluate(uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probability of class 1) of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.MultilabelConformalEvaluator[source]#

Bases: BinaryClassificationEvaluator

Evaluate the coverage of conformal prediction for binary classification datasets with multiple labels.

\[\Pr \left( \hat{\mathcal C}_{\text{in}}(X) \subseteq \mathcal Y \subseteq \hat{\mathcal C}_{\text{out}}(X) \right)\]

where the in-set \(\hat{\mathcal C}_\text{in}\) is contained by the set of true labels \(\mathcal Y\) and \(\mathcal Y\) is contained within the out-set \(\hat{\mathcal C}_\text{out}\).

evaluate(uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probability of class 1) of the shape of n x t, where n is the number of input molecules/reactions, and t is the number of tasks.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.MulticlassClassificationEvaluator[source]#

Bases: abc.ABC

Evaluates the quality of uncertainty estimates in multiclass classification tasks.

abstractmethod evaluate(uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probabilities for each class) of the shape of n x t x c, where n is the number of input molecules/reactions, t is the number of tasks, and c is the number of classes.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.NLLMulticlassEvaluator[source]#

Bases: MulticlassClassificationEvaluator

Evaluate uncertainty values for multiclass classification datasets using the mean negative-log-likelihood of the targets given the assigned probabilities from the model:

\[\mathrm{NLL} = -\log(p_{y_i})\]

where \(p_{y_i}\) is the predicted probability for the true class \(y_i\), calculated as:

\[p_{y_i} = \sum_{k=1}^{K} \mathbb{1}(y_i = k) \cdot p_k\]

Here: \(K\) is the total number of classes, \(\mathbb{1}(y_i = k)\) is the indicator function that is 1 when the true class \(y_i\) equals class \(k\), and 0 otherwise, and \(p_k\) is the predicted probability for class \(k\).

The function returns a tensor containing the mean NLL for each task.

evaluate(uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probabilities for each class) of the shape of n x t x c, where n is the number of input molecules/reactions, t is the number of tasks, and c is the number of classes.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor

class chemprop.uncertainty.evaluator.MulticlassConformalEvaluator[source]#

Bases: MulticlassClassificationEvaluator

Evaluate the coverage of conformal prediction for multiclass classification datasets.

\[\Pr (Y_{\text{test}} \in C(X_{\text{test}}))\]

where the \(C(X_{\text{test}}) \subset \{1 \mathrel{.\,.} K\}\) is a prediction set of possible labels .

evaluate(uncs, targets, mask)[source]#

Evaluate the performance of uncertainty predictions against the model target values.

Parameters:
  • uncs (Tensor) – the predicted uncertainties (i.e., the predicted probabilities for each class) of the shape of n x t x c, where n is the number of input molecules/reactions, t is the number of tasks, and c is the number of classes.

  • targets (Tensor) – a tensor of the shape n x t

  • mask (Tensor) – a tensor of the shape n x t indicating whether the given values should be used in the evaluation

Returns:

a tensor of the shape t containing the evaluated metrics

Return type:

Tensor