Fig. 2: Schematic workflow of DataSAIL.
From: Data splitting to avoid information leakage with DataSAIL

Input can be any data (special focus is on biochemical data). DataSAIL then computes a pairwise distance or similarity matrix (a) and stratifies the data into a constant number of clusters based on that (b). These clusters are split into k folds, using off-the-shelf ILP solvers (c). From the partitioning of the clusters, DataSAIL infers the partitioning of the elementary data points (d). Created in BioRender. Joeres, R. (2025) https://BioRender.com/m81k197.