Table 2 The table gives an overview of how well near-neighbors in a chemical series are retrieved with a transformer

From: Exhaustive local chemical space exploration using a transformer model

BS

\({\bar{t}}_{nll}\)

\({\bar{t}}_{sim}\)

TP

FN

TN

     

FN-B

FN-P

  
   

\(\bar{x}\)

\(\tilde{x}\)

\(\bar{x}\)

\(\tilde{x}\)

\(\bar{x}\)

\(\tilde{x}\)

\(\bar{x}\)

\(\tilde{x}\)

1000

11.03 ± 1.35

0.71 ± 0.05

29.48

23.33

4.69

2.94

3.41

0.83

62.42

68.10

5000

12.49 ± 1.64

0.68 ± 0.06

40.31

35.56

3.77

1.79

3.64

0.95

52.28

56.25

10,000

13.22 ± 1.95

0.66 ± 0.06

45.33

40.00

3.48

1.69

3.29

1.14

47.90

50.00

  1. True positive (TP—the higher the better), false negative (FN—the lower the better), and true negative (TN) for beam sizes (BS) equal to 1000, 5000, and 10,000 are shown. Columns FN-B and FN-P, represent the two types of false negatives. Either due to beam search being an approximation of exhaustive search or low precedence due to the molecular transformation is not well represented in the training set, respectively. \({\bar{t}}_{nll}\) and \({\bar{t}}_{sim}\) represent the average tnll and tsim with standard deviations, respectively. \(\bar{x}\) and \(\tilde{x}\) represent mean and median expressed in percentages, respectively, over the 200 chemical series.