Utility Functions
chemprop.utils.py contains general purpose utility functions.
- chemprop.utils.build_lr_scheduler(optimizer: Optimizer, args: TrainArgs, total_epochs: List[int] | None = None) _LRScheduler [source]
Builds a PyTorch learning rate scheduler.
- Parameters:
optimizer – The Optimizer whose learning rate will be scheduled.
args – A
TrainArgs
object containing learning rate arguments.total_epochs – The total number of epochs for which the model will be run.
- Returns:
An initialized learning rate scheduler.
- chemprop.utils.build_optimizer(model: Module, args: TrainArgs) Optimizer [source]
Builds a PyTorch Optimizer.
- Parameters:
model – The model to optimize.
args – A
TrainArgs
object containing optimizer arguments.
- Returns:
An initialized Optimizer.
- chemprop.utils.create_logger(name: str, save_dir: str | None = None, quiet: bool = False) Logger [source]
Creates a logger with a stream handler and two file handlers.
If a logger with that name already exists, simply returns that logger. Otherwise, creates a new logger with a stream handler and two file handlers.
The stream handler prints to the screen depending on the value of
quiet
. One file handler (verbose.log
) saves all logs, the other (quiet.log
) only saves important info.- Parameters:
name – The name of the logger.
save_dir – The directory in which to save the logs.
quiet – Whether the stream handler should be quiet (i.e., print only important info).
- Returns:
The logger.
- chemprop.utils.load_args(path: str) TrainArgs [source]
Loads the arguments a model was trained with.
- Parameters:
path – Path where model checkpoint is saved.
- Returns:
The
TrainArgs
object that the model was trained with.
- chemprop.utils.load_checkpoint(path: str, device: device | None = None, logger: Logger | None = None) MoleculeModel [source]
Loads a model checkpoint.
- Parameters:
path – Path where checkpoint is saved.
device – Device where the model will be moved.
logger – A logger for recording output.
- Returns:
The loaded
MoleculeModel
.
- chemprop.utils.load_frzn_model(model: <module 'torch.nn' from '/home/docs/checkouts/readthedocs.org/user_builds/chemprop/conda/latest/lib/python3.8/site-packages/torch/nn/__init__.py'>, path: str, current_args: ~argparse.Namespace | None = None, cuda: bool | None = None, logger: ~logging.Logger | None = None) MoleculeModel [source]
Loads a model checkpoint. :param path: Path where checkpoint is saved. :param current_args: The current arguments. Replaces the arguments loaded from the checkpoint if provided. :param cuda: Whether to move model to cuda. :param logger: A logger. :return: The loaded MoleculeModel.
- chemprop.utils.load_scalers(path: str) Tuple[StandardScaler, StandardScaler, StandardScaler, StandardScaler, List[StandardScaler]] [source]
Loads the scalers a model was trained with.
- Parameters:
path – Path where model checkpoint is saved.
- Returns:
A tuple with the data
StandardScaler
and featuresStandardScaler
.
- chemprop.utils.load_task_names(path: str) List[str] [source]
Loads the task names a model was trained with.
- Parameters:
path – Path where model checkpoint is saved.
- Returns:
A list of the task names that the model was trained with.
- chemprop.utils.makedirs(path: str, isfile: bool = False) None [source]
Creates a directory given a path to either a directory or file.
If a directory is provided, creates that directory. If a file is provided (i.e.
isfile == True
), creates the parent directory for that file.- Parameters:
path – Path to a directory or file.
isfile – Whether the provided path is a directory or file.
- chemprop.utils.multitask_mean(scores: ndarray, metric: str, axis: int | None = None, ignore_nan_metrics: bool = False) float [source]
A function for combining the metric scores across different model tasks into a single score. When the metric being used is one that varies with the magnitude of the task (such as RMSE), a geometric mean is used, otherwise a more typical arithmetic mean is used. This prevents a task with a larger magnitude from dominating over one with a smaller magnitude (e.g., temperature and pressure).
- Parameters:
scores – The scores from different tasks for a single metric.
metric – The metric used to generate the scores.
axis – The axis along which to take the mean.
ignore_nan_metrics – Ignore invalid task metrics (NaNs) when computing average metrics across tasks.
- Returns:
The combined score across the tasks.
- chemprop.utils.overwrite_state_dict(loaded_param_name: str, model_param_name: str, loaded_state_dict: OrderedDict, model_state_dict: OrderedDict, logger: Logger | None = None) OrderedDict [source]
Overwrites a given parameter in the current model with the loaded model. :param loaded_param_name: name of parameter in checkpoint model. :param model_param_name: name of parameter in current model. :param loaded_state_dict: state_dict for checkpoint model. :param model_state_dict: state_dict for current model. :param logger: A logger. :return: The updated state_dict for the current model.
- chemprop.utils.save_checkpoint(path: str, model: MoleculeModel, scaler: StandardScaler | None = None, features_scaler: StandardScaler | None = None, atom_descriptor_scaler: StandardScaler | None = None, bond_descriptor_scaler: StandardScaler | None = None, atom_bond_scaler: AtomBondScaler | None = None, args: TrainArgs | None = None) None [source]
Saves a model checkpoint.
- Parameters:
model – A
MoleculeModel
.scaler – A
StandardScaler
fitted on the data.features_scaler – A
StandardScaler
fitted on the features.atom_descriptor_scaler – A
StandardScaler
fitted on the atom descriptors.bond_descriptor_scaler – A
StandardScaler
fitted on the bond descriptors.atom_bond_scaler – A
AtomBondScaler
fitted on the atomic/bond targets.args – The
TrainArgs
object containing the arguments the model was trained with.path – Path where checkpoint will be saved.
- chemprop.utils.save_smiles_splits(data_path: str, save_dir: str, task_names: List[str] | None = None, features_path: List[str] | None = None, constraints_path: str | None = None, train_data: MoleculeDataset | None = None, val_data: MoleculeDataset | None = None, test_data: MoleculeDataset | None = None, smiles_columns: List[str] | None = None, loss_function: str | None = None, logger: Logger | None = None) None [source]
Saves a csv file with train/val/test splits of target data and additional features. Also saves indices of train/val/test split as a pickle file. Pickle file does not support repeated entries with the same SMILES or entries entered from a path other than the main data path, such as a separate test path.
- Parameters:
data_path – Path to data CSV file.
save_dir – Path where pickle files will be saved.
task_names – List of target names for the model as from the function get_task_names(). If not provided, will use datafile header entries.
features_path – List of path(s) to files with additional molecule features.
constraints_path – Path to constraints applied to atomic/bond properties prediction.
train_data – Train
MoleculeDataset
.val_data – Validation
MoleculeDataset
.test_data – Test
MoleculeDataset
.smiles_columns – The name of the column containing SMILES. By default, uses the first column.
loss_function – The loss function to be used in training.
logger – A logger for recording output.
- chemprop.utils.timeit(logger_name: str | None = None) Callable[[Callable], Callable] [source]
Creates a decorator which wraps a function with a timer that prints the elapsed time.
- Parameters:
logger_name – The name of the logger used to record output. If None, uses
print
instead.- Returns:
A decorator which wraps a function with a timer that prints the elapsed time.
- chemprop.utils.update_prediction_args(predict_args: PredictArgs, train_args: TrainArgs, missing_to_defaults: bool = True, validate_feature_sources: bool = True) None [source]
Updates prediction arguments with training arguments loaded from a checkpoint file. If an argument is present in both, the prediction argument will be used.
Also raises errors for situations where the prediction arguments and training arguments are different but must match for proper function.
- Parameters:
predict_args – The
PredictArgs
object containing the arguments to use for making predictions.train_args – The
TrainArgs
object containing the arguments used to train the model previously.missing_to_defaults – Whether to replace missing training arguments with the current defaults for :class: ~chemprop.args.TrainArgs. This is used for backwards compatibility.
validate_feature_sources – Indicates whether the feature sources (from path or generator) are checked for consistency between the training and prediction arguments. This is not necessary for fingerprint generation, where molecule features are not used.