Fig. 2: Results from evaluating MolE embeddings.
From: MolE: a foundation model for molecular graphs using disentangled attention

a Evaluation of molecular embeddings on a neighbor variation test. These boxplots represent the distribution of neighborhood overlap across all molecules (n = 79,568) for different molecular encodings. The closer the overlap is to 1, the more k-nearest neighbors are shared between the two encoding methods. Morgan fingerprints of radius 2 show high neighborhood overlap with embeddings from MolE pretrained solely on the self-supervised task. The centerline of the boxplot represents the median; the bounds of the box represent the first and third quartile and the whiskers the 1.5 interquartile rage (IQR). b U-map representation of the MolE atomic embeddings for environments centered on heteroatoms. It is interesting to see that different subgraphs with similar biological effects (e.g. bioisosteres) lay close in the embedding space.