Fig. 4: A chemical series with a source molecule and the retrieved target molecules (with green borders) with beam search for two different beam sizes (BS).
From: Exhaustive local chemical space exploration using a transformer model

The red vertical line represents the maximum NLL (negative log-likelihood) reachable by the two beam search sizes which is 11.90 and 12.63 for BS = 1000 and BS = 5000, respectively. The compounds to the right of the red vertical line cannot be found with these beam sizes due to the high NLL for the molecular transformation. The right-most compound has a similarity (denoted with Sim in the figure) of 0.71 to the source compound but low precedence as the NLL is equal to 17.23. The low precedence for the molecular transformation might be due to the azo group which is a relatively uncommon chemical substructure in the training set.