Interpretation

chemprop.interpret.py uses a Monte Carlo Tree Search to interpret trained Chemprop models by identifying substructures of a molecule which are primarily responsible for Chemprop’s prediction.

class chemprop.interpret.ChempropModel(args: InterpretArgs)[source]

A ChempropModel is a wrapper around a MoleculeModel for interpretation.

Parameters:

args – A InterpretArgs object containing arguments for interpretation.

class chemprop.interpret.MCTSNode(smiles: str, atoms: List[int], W: float = 0, N: int = 0, P: float = 0)[source]

A MCTSNode represents a node in a Monte Carlo Tree Search.

Parameters:
  • smiles – The SMILES for the substructure at this node.

  • atoms – A list of atom indices represented by this node.

  • W – The W value of this node.

  • N – The N value of this node.

  • P – The P value of this node.

chemprop.interpret.chemprop_interpret() None[source]

Runs interpretation of a Chemprop model.

This is the entry point for the command line command chemprop_interpret.

chemprop.interpret.extract_subgraph(smiles: str, selected_atoms: Set[int]) Tuple[str, List[int]][source]

Extracts a subgraph from a SMILES given a set of atom indices.

Parameters:
  • smiles – A SMILES from which to extract a subgraph.

  • selected_atoms – The atoms which form the subgraph to be extracted.

Returns:

A tuple containing a SMILES representing the subgraph and a list of root atom indices from the selected indices.

chemprop.interpret.find_clusters(mol: Mol) Tuple[List[Tuple[int, ...]], List[List[int]]][source]

Finds clusters within the molecule.

Parameters:

mol – An RDKit molecule.

Returns:

A tuple containing a list of atom tuples representing the clusters and a list of lists of atoms in each cluster.

chemprop.interpret.interpret(args: InterpretArgs) None[source]

Runs interpretation of a Chemprop model using the Monte Carlo Tree Search algorithm.

Parameters:

args – A InterpretArgs object containing arguments for interpretation.

chemprop.interpret.mcts(smiles: str, scoring_function: Callable[[List[str]], List[float]], n_rollout: int, max_atoms: int, prop_delta: float) List[MCTSNode][source]

Runs the Monte Carlo Tree Search algorithm.

Parameters:
  • smiles – The SMILES of the molecule to perform the search on.

  • scoring_function – A function for scoring subgraph SMILES using a Chemprop model.

  • n_rollout – THe number of MCTS rollouts to perform.

  • max_atoms – The maximum number of atoms allowed in an extracted rationale.

  • prop_delta – The minimum required property value for a satisfactory rationale.

Returns:

A list of rationales each represented by a MCTSNode.

chemprop.interpret.mcts_rollout(node: MCTSNode, state_map: Dict[str, MCTSNode], orig_smiles: str, clusters: List[Set[int]], atom_cls: List[Set[int]], nei_cls: List[Set[int]], scoring_function: Callable[[List[str]], List[float]]) float[source]

A Monte Carlo Tree Search rollout from a given MCTSNode.

Parameters:
  • node – The MCTSNode from which to begin the rollout.

  • state_map – A mapping from SMILES to MCTSNode.

  • orig_smiles – The original SMILES of the molecule.

  • clusters – Clusters of atoms.

  • atom_cls – Atom indices in the clusters.

  • nei_cls – Neighboring clusters.

  • scoring_function – A function for scoring subgraph SMILES using a Chemprop model.

Returns:

The score of this MCTS rollout.