Utility Functions

chemprop.utils.py contains general purpose utility functions.

chemprop.utils.build_lr_scheduler(optimizer: Optimizer, args: TrainArgs, total_epochs: List[int] | None = None) → _LRScheduler[source]

Builds a PyTorch learning rate scheduler.

Parameters:

optimizer – The Optimizer whose learning rate will be scheduled.
args – A TrainArgs object containing learning rate arguments.
total_epochs – The total number of epochs for which the model will be run.

Returns:

An initialized learning rate scheduler.

chemprop.utils.build_optimizer(model: Module, args: TrainArgs) → Optimizer[source]

Builds a PyTorch Optimizer.

Parameters:

model – The model to optimize.
args – A TrainArgs object containing optimizer arguments.

Returns:

An initialized Optimizer.

chemprop.utils.create_logger(name: str, save_dir: str | None = None, quiet: bool = False) → Logger[source]

Creates a logger with a stream handler and two file handlers.

If a logger with that name already exists, simply returns that logger. Otherwise, creates a new logger with a stream handler and two file handlers.

The stream handler prints to the screen depending on the value of quiet. One file handler (verbose.log) saves all logs, the other (quiet.log) only saves important info.

Parameters:

name – The name of the logger.
save_dir – The directory in which to save the logs.
quiet – Whether the stream handler should be quiet (i.e., print only important info).

Returns:

The logger.

chemprop.utils.load_args(path: str) → TrainArgs[source]

Loads the arguments a model was trained with.

Parameters:: path – Path where model checkpoint is saved.
Returns:: The TrainArgs object that the model was trained with.

chemprop.utils.load_checkpoint(path: str, device: device | None = None, logger: Logger | None = None) → MoleculeModel[source]

Loads a model checkpoint.

Parameters:

path – Path where checkpoint is saved.
device – Device where the model will be moved.
logger – A logger for recording output.

Returns:

The loaded MoleculeModel.

chemprop.utils.load_frzn_model(model: <module 'torch.nn' from '/home/docs/checkouts/readthedocs.org/user_builds/chemprop/conda/latest/lib/python3.8/site-packages/torch/nn/__init__.py'>, path: str, current_args: ~argparse.Namespace | None = None, cuda: bool | None = None, logger: ~logging.Logger | None = None) → MoleculeModel[source]: Loads a model checkpoint. :param path: Path where checkpoint is saved. :param current_args: The current arguments. Replaces the arguments loaded from the checkpoint if provided. :param cuda: Whether to move model to cuda. :param logger: A logger. :return: The loaded MoleculeModel.

chemprop.utils.load_scalers(path: str) → Tuple[StandardScaler, StandardScaler, StandardScaler, StandardScaler, List[StandardScaler]][source]

Loads the scalers a model was trained with.

Parameters:: path – Path where model checkpoint is saved.
Returns:: A tuple with the data StandardScaler and features StandardScaler.

chemprop.utils.load_task_names(path: str) → List[str][source]

Loads the task names a model was trained with.

Parameters:: path – Path where model checkpoint is saved.
Returns:: A list of the task names that the model was trained with.

chemprop.utils.makedirs(path: str, isfile: bool = False) → None[source]

Creates a directory given a path to either a directory or file.

If a directory is provided, creates that directory. If a file is provided (i.e. isfile == True), creates the parent directory for that file.

Parameters:

path – Path to a directory or file.
isfile – Whether the provided path is a directory or file.

chemprop.utils.multitask_mean(scores: ndarray, metric: str, axis: int | None = None, ignore_nan_metrics: bool = False) → float[source]

A function for combining the metric scores across different model tasks into a single score. When the metric being used is one that varies with the magnitude of the task (such as RMSE), a geometric mean is used, otherwise a more typical arithmetic mean is used. This prevents a task with a larger magnitude from dominating over one with a smaller magnitude (e.g., temperature and pressure).

Parameters:

scores – The scores from different tasks for a single metric.
metric – The metric used to generate the scores.
axis – The axis along which to take the mean.
ignore_nan_metrics – Ignore invalid task metrics (NaNs) when computing average metrics across tasks.

Returns:

The combined score across the tasks.

chemprop.utils.overwrite_state_dict(loaded_param_name: str, model_param_name: str, loaded_state_dict: OrderedDict, model_state_dict: OrderedDict, logger: Logger | None = None) → OrderedDict[source]: Overwrites a given parameter in the current model with the loaded model. :param loaded_param_name: name of parameter in checkpoint model. :param model_param_name: name of parameter in current model. :param loaded_state_dict: state_dict for checkpoint model. :param model_state_dict: state_dict for current model. :param logger: A logger. :return: The updated state_dict for the current model.

chemprop.utils.save_checkpoint(path: str, model: MoleculeModel, scaler: StandardScaler | None = None, features_scaler: StandardScaler | None = None, atom_descriptor_scaler: StandardScaler | None = None, bond_descriptor_scaler: StandardScaler | None = None, atom_bond_scaler: AtomBondScaler | None = None, args: TrainArgs | None = None) → None[source]

Saves a model checkpoint.

Parameters:

model – A MoleculeModel.
scaler – A StandardScaler fitted on the data.
features_scaler – A StandardScaler fitted on the features.
atom_descriptor_scaler – A StandardScaler fitted on the atom descriptors.
bond_descriptor_scaler – A StandardScaler fitted on the bond descriptors.
atom_bond_scaler – A AtomBondScaler fitted on the atomic/bond targets.
args – The TrainArgs object containing the arguments the model was trained with.
path – Path where checkpoint will be saved.

chemprop.utils.save_smiles_splits(data_path: str, save_dir: str, task_names: List[str] | None = None, features_path: List[str] | None = None, constraints_path: str | None = None, train_data: MoleculeDataset | None = None, val_data: MoleculeDataset | None = None, test_data: MoleculeDataset | None = None, smiles_columns: List[str] | None = None, loss_function: str | None = None, logger: Logger | None = None) → None[source]

Saves a csv file with train/val/test splits of target data and additional features. Also saves indices of train/val/test split as a pickle file. Pickle file does not support repeated entries with the same SMILES or entries entered from a path other than the main data path, such as a separate test path.

Parameters:

data_path – Path to data CSV file.
save_dir – Path where pickle files will be saved.
task_names – List of target names for the model as from the function get_task_names(). If not provided, will use datafile header entries.
features_path – List of path(s) to files with additional molecule features.
constraints_path – Path to constraints applied to atomic/bond properties prediction.
train_data – Train MoleculeDataset.
val_data – Validation MoleculeDataset.
test_data – Test MoleculeDataset.
smiles_columns – The name of the column containing SMILES. By default, uses the first column.
loss_function – The loss function to be used in training.
logger – A logger for recording output.

chemprop.utils.timeit(logger_name: str | None = None) → Callable[[Callable], Callable][source]

Creates a decorator which wraps a function with a timer that prints the elapsed time.

Parameters:: logger_name – The name of the logger used to record output. If None, uses print instead.
Returns:: A decorator which wraps a function with a timer that prints the elapsed time.

chemprop.utils.update_prediction_args(predict_args: PredictArgs, train_args: TrainArgs, missing_to_defaults: bool = True, validate_feature_sources: bool = True) → None[source]

Updates prediction arguments with training arguments loaded from a checkpoint file. If an argument is present in both, the prediction argument will be used.

Also raises errors for situations where the prediction arguments and training arguments are different but must match for proper function.

Parameters:

predict_args – The PredictArgs object containing the arguments to use for making predictions.
train_args – The TrainArgs object containing the arguments used to train the model previously.
missing_to_defaults – Whether to replace missing training arguments with the current defaults for :class: ~chemprop.args.TrainArgs. This is used for backwards compatibility.
validate_feature_sources – Indicates whether the feature sources (from path or generator) are checked for consistency between the training and prediction arguments. This is not necessary for fingerprint generation, where molecule features are not used.