Training and Predicting

chemprop.train contains functions to train and make predictions with message passing neural networks.

Train

chemprop.train.train.py trains a model for a single epoch.

chemprop.train.train.train(model: chemprop.models.model.MoleculeModel, data_loader: chemprop.data.data.MoleculeDataLoader, loss_func: Callable, optimizer: torch.optim.optimizer.Optimizer, scheduler: torch.optim.lr_scheduler._LRScheduler, args: chemprop.args.TrainArgs, n_iter: int = 0, logger: Optional[logging.Logger] = None, writer: Optional[tensorboardX.writer.SummaryWriter] = None) int[source]

Trains a model for an epoch.

Parameters
  • model – A MoleculeModel.

  • data_loader – A MoleculeDataLoader.

  • loss_func – Loss function.

  • optimizer – An optimizer.

  • scheduler – A learning rate scheduler.

  • args – A TrainArgs object containing arguments for training the model.

  • n_iter – The number of iterations (training examples) trained on so far.

  • logger – A logger for recording output.

  • writer – A tensorboardX SummaryWriter.

Returns

The total number of iterations (training examples) trained on so far.

Run Training

chemprop.train.run_training.py loads data, initializes the model, and runs training, validation, and testing of the model.

chemprop.train.run_training.run_training(args: chemprop.args.TrainArgs, data: chemprop.data.data.MoleculeDataset, logger: Optional[logging.Logger] = None) Dict[str, List[float]][source]

Loads data, trains a Chemprop model, and returns test scores for the model checkpoint with the highest validation score.

Parameters
  • args – A TrainArgs object containing arguments for loading data and training the Chemprop model.

  • data – A MoleculeDataset containing the data.

  • logger – A logger to record output.

Returns

A dictionary mapping each metric in args.metrics to a list of values for each task.

Cross-Validation

chemprop.train.cross_validate.py provides an outer loop around chemprop.train.run_training.py that runs training and evaluating for each of several splits of the data.

chemprop.train.cross_validate.chemprop_train() None[source]

Parses Chemprop training arguments and trains (cross-validates) a Chemprop model.

This is the entry point for the command line command chemprop_train.

chemprop.train.cross_validate.cross_validate(args: chemprop.args.TrainArgs, train_func: Callable[[chemprop.args.TrainArgs, chemprop.data.data.MoleculeDataset, logging.Logger], Dict[str, List[float]]]) Tuple[float, float][source]

Runs k-fold cross-validation.

For each of k splits (folds) of the data, trains and tests a model on that split and aggregates the performance across folds.

Parameters
  • args – A TrainArgs object containing arguments for loading data and training the Chemprop model.

  • train_func – Function which runs training.

Returns

A tuple containing the mean and standard deviation performance across folds.

Predict

chemprop.train.predict.py uses a trained model to make predicts on data.

chemprop.train.predict.predict(model: chemprop.models.model.MoleculeModel, data_loader: chemprop.data.data.MoleculeDataLoader, disable_progress_bar: bool = False, scaler: Optional[chemprop.data.scaler.StandardScaler] = None) List[List[float]][source]

Makes predictions on a dataset using an ensemble of models.

Parameters
  • model – A MoleculeModel.

  • data_loader – A MoleculeDataLoader.

  • disable_progress_bar – Whether to disable the progress bar.

  • scaler – A StandardScaler object fit on the training targets.

Returns

A list of lists of predictions. The outer list is molecules while the inner list is tasks.

Make Predictions

chemprop.train.make_predictions.py is a wrapper aoround chemprop.train.predict.py which loads data, loads a trained model, makes predictions, and saves those predictions.

chemprop.train.make_predictions.chemprop_predict() None[source]

Parses Chemprop predicting arguments and runs prediction using a trained Chemprop model.

This is the entry point for the command line command chemprop_predict.

chemprop.train.make_predictions.load_data(args: chemprop.args.PredictArgs, smiles: List[List[str]])[source]

Function to load data from a list of smiles or a file.

Parameters
  • args – A PredictArgs object containing arguments for loading data and a model and making predictions.

  • smiles – A list of list of smiles, or None if data is to be read from file

Returns

A tuple of a MoleculeDataset containing all datapoints, a MoleculeDataset containing only valid datapoints, a MoleculeDataLoader and a dictionary mapping full to valid indices.

chemprop.train.make_predictions.load_model(args: chemprop.args.PredictArgs, generator: bool = False)[source]

Function to load a model or ensemble of models from file. If generator is True, a generator of the respective model and scaler objects is returned (memory efficient), else the full list (holding all models in memory, necessary for preloading).

Parameters
  • args – A PredictArgs object containing arguments for loading data and a model and making predictions.

  • generator – A boolean to return a generator instead of a list of models and scalers.

Returns

A tuple of updated prediction arguments, training arguments, a list or generator object of models, a list or generator object of scalers, the number of tasks and their respective names.

chemprop.train.make_predictions.make_predictions(args: chemprop.args.PredictArgs, smiles: List[List[str]] = None, model_objects: Tuple[chemprop.args.PredictArgs, chemprop.args.TrainArgs, List[chemprop.models.model.MoleculeModel], List[chemprop.data.scaler.StandardScaler], int, List[str]] = None) List[List[Optional[float]]][source]

Loads data and a trained model and uses the model to make predictions on the data.

If SMILES are provided, then makes predictions on smiles. Otherwise makes predictions on args.test_data.

Parameters
  • args – A PredictArgs object containing arguments for loading data and a model and making predictions.

  • smiles – List of list of SMILES to make predictions on.

  • model_objects – Tuple of output of load_model function which can be called separately.

Returns

A list of lists of target predictions.

chemprop.train.make_predictions.predict_and_save(args: chemprop.args.PredictArgs, train_args: chemprop.args.TrainArgs, test_data: chemprop.data.data.MoleculeDataset, task_names: List[str], num_tasks: int, test_data_loader: chemprop.data.data.MoleculeDataLoader, full_data: chemprop.data.data.MoleculeDataset, full_to_valid_indices: dict, models: List[chemprop.models.model.MoleculeModel], scalers: List[List[chemprop.data.scaler.StandardScaler]])[source]

Function to predict with a model and save the predictions to file.

Parameters
  • args – A PredictArgs object containing arguments for loading data and a model and making predictions.

  • train_args – A TrainArgs object containing arguments for training the model.

  • test_data – A MoleculeDataset containing valid datapoints.

  • task_names – A list of task names.

  • num_tasks – Number of tasks.

  • test_data_loader – A MoleculeDataLoader to load the test data.

  • full_data – A MoleculeDataset containing all (valid and invalid) datapoints.

  • full_to_valid_indices – A dictionary dictionary mapping full to valid indices.

  • models – A list or generator object of MoleculeModels.

  • scalers – A list or generator object of StandardScaler objects.

Returns

A list of lists of target predictions.

chemprop.train.make_predictions.set_features(args: chemprop.args.PredictArgs, train_args: chemprop.args.TrainArgs)[source]

Function to set extra options.

Parameters
  • args – A PredictArgs object containing arguments for loading data and a model and making predictions.

  • train_args – A TrainArgs object containing arguments for training the model.

Evaluate

chemprop.train.evaluate.py contains functions for evaluating the quality of predictions by comparing them to the true values.

chemprop.train.evaluate.evaluate(model: chemprop.models.model.MoleculeModel, data_loader: chemprop.data.data.MoleculeDataLoader, num_tasks: int, metrics: List[str], dataset_type: str, scaler: Optional[chemprop.data.scaler.StandardScaler] = None, logger: Optional[logging.Logger] = None) Dict[str, List[float]][source]

Evaluates an ensemble of models on a dataset by making predictions and then evaluating the predictions.

Parameters
  • model – A MoleculeModel.

  • data_loader – A MoleculeDataLoader.

  • num_tasks – Number of tasks.

  • metrics – A list of names of metric functions.

  • dataset_type – Dataset type.

  • scaler – A StandardScaler object fit on the training targets.

  • logger – A logger to record output.

Returns

A dictionary mapping each metric in metrics to a list of values for each task.

chemprop.train.evaluate.evaluate_predictions(preds: List[List[float]], targets: List[List[float]], num_tasks: int, metrics: List[str], dataset_type: str, logger: Optional[logging.Logger] = None) Dict[str, List[float]][source]

Evaluates predictions using a metric function after filtering out invalid targets.

Parameters
  • preds – A list of lists of shape (data_size, num_tasks) with model predictions.

  • targets – A list of lists of shape (data_size, num_tasks) with targets.

  • num_tasks – Number of tasks.

  • metrics – A list of names of metric functions.

  • dataset_type – Dataset type.

  • logger – A logger to record output.

Returns

A dictionary mapping each metric in metrics to a list of values for each task.