Table 1 Overview of different splitting tools and frameworks and their capabilities towards biochemical data
From: Data splitting to avoid information leakage with DataSAIL
Tool | TDC46 | DeepChem26 | sklearn32 | LoHi27 | GraphPart28 | astartes47 | DataSAIL |
---|---|---|---|---|---|---|---|
Features | |||||||
1D splits | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
2D splits | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
Stratified splits | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ |
Preserves all data (1D splits) | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ |
Preserves all data (2D splits) | N/A | N/A | N/A | N/A | N/A | N/A | ✗ |
Supported input data | |||||||
Proteins | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
Small molecules | ✓ | ✓ | ✗ | ✓ | ✗ | ✓ | ✓ |
DNA & RNA sequences | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
Genomes & longer contigs | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
Custom data | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |