Fig. 5: Biological interpretation and feature distribution of VideoMol.
From: A molecular video-derived foundation model for scientific drug discovery

a Visualization of each frame in 100 molecular videos (60 frames for each video). Representations are extracted by VideoMol and dimensionally reduced by t-SNE. Different colors represent frames in different cluster videos. DB index is a metric to evaluate the clustering quality, and the larger the value, the better the clustering performance. b Similarity distribution (n = 20,000 samples) of intra-video and inter-video. Similarity is computed using a pair of frames from intra-video or inter-video. The content in brackets indicates the average similarity of the distribution. c t-SNE visualization (10,000 samples) of features extracted by VideoMol. Different colors represent different cluster labels (this cluster label is obtained in the chemical-aware pretraining task). d–f Grad-CAM visualization of VideoMol on molecular frames. We use 0.6 as the threshold for visualization, that is, set the importance lower than 0.6 to 0. In d, each row represents a molecular video. In e, pairs of molecular frames represent frames where structure is missing and frames where structure appears, respectively. In f each panel represents examples of key structures related to BACE-1 inhibitory activities from frames of different molecules. Source data are provided as a Source Data file.