Utility Functions

chemprop.utils.py contains general purpose utility functions.

chemprop.utils.accuracy(targets: List[int], preds: Union[List[float], List[List[float]]], threshold: float = 0.5) float[source]

Computes the accuracy of a binary prediction task using a given threshold for generating hard predictions.

Alternatively, computes accuracy for a multiclass prediction task by picking the largest probability.

Parameters
  • targets – A list of binary targets.

  • preds – A list of prediction probabilities.

  • threshold – The threshold above which a prediction is a 1 and below which (inclusive) a prediction is a 0.

Returns

The computed accuracy.

chemprop.utils.bce(targets: List[int], preds: List[float]) float[source]

Computes the binary cross entropy loss.

Parameters
  • targets – A list of binary targets.

  • preds – A list of prediction probabilities.

Returns

The computed binary cross entropy.

chemprop.utils.build_lr_scheduler(optimizer: torch.optim.optimizer.Optimizer, args: chemprop.args.TrainArgs, total_epochs: Optional[List[int]] = None) torch.optim.lr_scheduler._LRScheduler[source]

Builds a PyTorch learning rate scheduler.

Parameters
  • optimizer – The Optimizer whose learning rate will be scheduled.

  • args – A TrainArgs object containing learning rate arguments.

  • total_epochs – The total number of epochs for which the model will be run.

Returns

An initialized learning rate scheduler.

chemprop.utils.build_optimizer(model: torch.nn.modules.module.Module, args: chemprop.args.TrainArgs) torch.optim.optimizer.Optimizer[source]

Builds a PyTorch Optimizer.

Parameters
  • model – The model to optimize.

  • args – A TrainArgs object containing optimizer arguments.

Returns

An initialized Optimizer.

chemprop.utils.create_logger(name: str, save_dir: Optional[str] = None, quiet: bool = False) logging.Logger[source]

Creates a logger with a stream handler and two file handlers.

If a logger with that name already exists, simply returns that logger. Otherwise, creates a new logger with a stream handler and two file handlers.

The stream handler prints to the screen depending on the value of quiet. One file handler (verbose.log) saves all logs, the other (quiet.log) only saves important info.

Parameters
  • name – The name of the logger.

  • save_dir – The directory in which to save the logs.

  • quiet – Whether the stream handler should be quiet (i.e., print only important info).

Returns

The logger.

chemprop.utils.get_loss_func(args: chemprop.args.TrainArgs) torch.nn.modules.module.Module[source]

Gets the loss function corresponding to a given dataset type.

Parameters

args – Arguments containing the dataset type (“classification”, “regression”, or “multiclass”).

Returns

A PyTorch loss function.

chemprop.utils.get_metric_func(metric: str) Callable[[Union[List[int], List[float]], List[float]], float][source]

Gets the metric function corresponding to a given metric name.

Supports:

  • auc: Area under the receiver operating characteristic curve

  • prc-auc: Area under the precision recall curve

  • rmse: Root mean squared error

  • mse: Mean squared error

  • mae: Mean absolute error

  • r2: Coefficient of determination R2

  • accuracy: Accuracy (using a threshold to binarize predictions)

  • cross_entropy: Cross entropy

  • binary_cross_entropy: Binary cross entropy

Parameters

metric – Metric name.

Returns

A metric function which takes as arguments a list of targets and a list of predictions and returns.

chemprop.utils.load_args(path: str) chemprop.args.TrainArgs[source]

Loads the arguments a model was trained with.

Parameters

path – Path where model checkpoint is saved.

Returns

The TrainArgs object that the model was trained with.

chemprop.utils.load_checkpoint(path: str, device: Optional[torch.device] = None, logger: Optional[logging.Logger] = None) chemprop.models.model.MoleculeModel[source]

Loads a model checkpoint.

Parameters
  • path – Path where checkpoint is saved.

  • device – Device where the model will be moved.

  • logger – A logger for recording output.

Returns

The loaded MoleculeModel.

chemprop.utils.load_frzn_model(model: <module 'torch.nn' from '/home/docs/checkouts/readthedocs.org/user_builds/chemprop/conda/latest/lib/python3.8/site-packages/torch/nn/__init__.py'>, path: str, current_args: Optional[argparse.Namespace] = None, cuda: Optional[bool] = None, logger: Optional[logging.Logger] = None) chemprop.models.model.MoleculeModel[source]

Loads a model checkpoint. :param path: Path where checkpoint is saved. :param current_args: The current arguments. Replaces the arguments loaded from the checkpoint if provided. :param cuda: Whether to move model to cuda. :param logger: A logger. :return: The loaded MoleculeModel.

chemprop.utils.load_scalers(path: str) Tuple[chemprop.data.scaler.StandardScaler, chemprop.data.scaler.StandardScaler, chemprop.data.scaler.StandardScaler, chemprop.data.scaler.StandardScaler][source]

Loads the scalers a model was trained with.

Parameters

path – Path where model checkpoint is saved.

Returns

A tuple with the data StandardScaler and features StandardScaler.

chemprop.utils.load_task_names(path: str) List[str][source]

Loads the task names a model was trained with.

Parameters

path – Path where model checkpoint is saved.

Returns

A list of the task names that the model was trained with.

chemprop.utils.makedirs(path: str, isfile: bool = False) None[source]

Creates a directory given a path to either a directory or file.

If a directory is provided, creates that directory. If a file is provided (i.e. isfile == True), creates the parent directory for that file.

Parameters
  • path – Path to a directory or file.

  • isfile – Whether the provided path is a directory or file.

chemprop.utils.mse(targets: List[float], preds: List[float]) float[source]

Computes the mean squared error.

Parameters
  • targets – A list of targets.

  • preds – A list of predictions.

Returns

The computed mse.

chemprop.utils.overwrite_state_dict(loaded_param_name: str, model_param_name: str, loaded_state_dict: collections.OrderedDict, model_state_dict: collections.OrderedDict, logger: Optional[logging.Logger] = None) collections.OrderedDict[source]

Overwrites a given parameter in the current model with the loaded model. :param loaded_param_name: name of parameter in checkpoint model. :param model_param_name: name of parameter in current model. :param loaded_state_dict: state_dict for checkpoint model. :param model_state_dict: state_dict for current model. :param logger: A logger. :return: The updated state_dict for the current model.

chemprop.utils.prc_auc(targets: List[int], preds: List[float]) float[source]

Computes the area under the precision-recall curve.

Parameters
  • targets – A list of binary targets.

  • preds – A list of prediction probabilities.

Returns

The computed prc-auc.

chemprop.utils.rmse(targets: List[float], preds: List[float]) float[source]

Computes the root mean squared error.

Parameters
  • targets – A list of targets.

  • preds – A list of predictions.

Returns

The computed rmse.

chemprop.utils.save_checkpoint(path: str, model: chemprop.models.model.MoleculeModel, scaler: Optional[chemprop.data.scaler.StandardScaler] = None, features_scaler: Optional[chemprop.data.scaler.StandardScaler] = None, atom_descriptor_scaler: Optional[chemprop.data.scaler.StandardScaler] = None, bond_feature_scaler: Optional[chemprop.data.scaler.StandardScaler] = None, args: Optional[chemprop.args.TrainArgs] = None) None[source]

Saves a model checkpoint.

Parameters
  • model – A MoleculeModel.

  • scaler – A StandardScaler fitted on the data.

  • features_scaler – A StandardScaler fitted on the features.

  • atom_descriptor_scaler – A StandardScaler fitted on the atom descriptors.

  • bond_feature_scaler – A StandardScaler fitted on the bond_fetaures.

  • args – The TrainArgs object containing the arguments the model was trained with.

  • path – Path where checkpoint will be saved.

chemprop.utils.save_smiles_splits(data_path: str, save_dir: str, task_names: Optional[List[str]] = None, features_path: Optional[List[str]] = None, train_data: Optional[chemprop.data.data.MoleculeDataset] = None, val_data: Optional[chemprop.data.data.MoleculeDataset] = None, test_data: Optional[chemprop.data.data.MoleculeDataset] = None, logger: Optional[logging.Logger] = None, smiles_columns: Optional[List[str]] = None) None[source]

Saves a csv file with train/val/test splits of target data and additional features. Also saves indices of train/val/test split as a pickle file. Pickle file does not support repeated entries with the same SMILES or entries entered from a path other than the main data path, such as a separate test path.

Parameters
  • data_path – Path to data CSV file.

  • save_dir – Path where pickle files will be saved.

  • task_names – List of target names for the model as from the function get_task_names(). If not provided, will use datafile header entries.

  • features_path – List of path(s) to files with additional molecule features.

  • train_data – Train MoleculeDataset.

  • val_data – Validation MoleculeDataset.

  • test_data – Test MoleculeDataset.

  • smiles_columns – The name of the column containing SMILES. By default, uses the first column.

  • logger – A logger for recording output.

chemprop.utils.timeit(logger_name: Optional[str] = None) Callable[[Callable], Callable][source]

Creates a decorator which wraps a function with a timer that prints the elapsed time.

Parameters

logger_name – The name of the logger used to record output. If None, uses print instead.

Returns

A decorator which wraps a function with a timer that prints the elapsed time.

chemprop.utils.update_prediction_args(predict_args: chemprop.args.PredictArgs, train_args: chemprop.args.TrainArgs, missing_to_defaults: bool = True, validate_feature_sources: bool = True) None[source]

Updates prediction arguments with training arguments loaded from a checkpoint file. If an argument is present in both, the prediction argument will be used.

Also raises errors for situations where the prediction arguments and training arguments are different but must match for proper function.

Parameters
  • predict_args – The PredictArgs object containing the arguments to use for making predictions.

  • train_args – The TrainArgs object containing the arguments used to train the model previously.

  • missing_to_defaults – Whether to replace missing training arguments with the current defaults for :class: ~chemprop.args.TrainArgs. This is used for backwards compatibility.

  • validate_feature_sources – Indicates whether the feature sources (from path or generator) are checked for consistency between the training and prediction arguments. This is not necessary for fingerprint generation, where molecule features are not used.