Fig. 5: One-dimensional data splitting with stratification. | Nature Communications

Fig. 5: One-dimensional data splitting with stratification.

From: Data splitting to avoid information leakage with DataSAIL

Fig. 5

We used the SR-ARE target dataset in Tox21, where the two classes are active and inactive small molecules. a t-SNE embeddings for a random stratified split, b for a similarity-aware stratified split. c ML model performances and information leakage for the different splits.

Back to article page