Fig. 3: One-dimensional datasets QM8 and Tox21.
From: Data splitting to avoid information leakage with DataSAIL

We show QM8 (a, c, e) and Tox21 (b, d, f) from the MoleculeNet benchmark collection. a, d show the t-SNE embeddings for random split and b, e for DataSAIL's S1 split. c, f show ML model performances and information leakages for the different splits, quantified using mean absolute errors (MAE, lower is better) for QM8 and area under the receiver operating characteristic (ROC-AUC, higher is better) for Tox21.