Fig. 5: One-dimensional data splitting with stratification.
From: Data splitting to avoid information leakage with DataSAIL

We used the SR-ARE target dataset in Tox21, where the two classes are active and inactive small molecules. a t-SNE embeddings for a random stratified split, b for a similarity-aware stratified split. c ML model performances and information leakage for the different splits.