Table 1 Overview of different splitting tools and frameworks and their capabilities towards biochemical data

From: Data splitting to avoid information leakage with DataSAIL

Tool

TDC46

DeepChem26

sklearn32

LoHi27

GraphPart28

astartes47

DataSAIL

Features

1D splits

2D splits

Stratified splits

Preserves all data (1D splits)

Preserves all data (2D splits)

N/A

N/A

N/A

N/A

N/A

N/A

Supported input data

Proteins

Small molecules

DNA & RNA sequences

Genomes & longer contigs

Custom data

  1. Checkmarks indicate that a tool can compute a split with the requested property without requiring preprocessing by the user.