Fingerprint#

To calculate the learned representations (encodings) of model inputs from a pretrained model, run

chemprop fingerprint --test-path <test_path> --model-path <model_path>

where <test_path> is the path to the CSV file containing SMILES strings, and <model_path> is the location of checkpoint(s) or model file(s) to use for prediction. It can be a path to either a single pretrained model checkpoint (.ckpt) or single pretrained model file (.pt), a directory that contains these files, or a list of path(s) and directory(s). If a directory, will recursively search and predict on all found (.pt) models. By default, predictions will be saved to the same directory as the test path. If desired, a different directory can be specified by using --output <path>. The output <path> can end with either .csv or .npz, and the output will be saved to the corresponding file type.

For example:

chemprop fingerprint --test-path tests/data/smis.csv \
    --model-path tests/data/example_model_v2_regression_mol.ckpt \
    --output fps.csv

Specifying FFN encoding layer#

By default, the encodings are returned from the penultimate linear layer of the model’s FFN. However, the exact layer to draw encodings from can be specified using --ffn-block-index <index>.

An index of 0 will simply return the post-aggregation representation without passing through the FFN. Here, an index of 1 will return the output of the first linear layer of the FFN, an index of 2 the second layer, and so on.

Specifying Data to Parse#

fingerprint shares the same arguments for specifying SMILES columns and reaction types as predict. For more detail, see Prediction.