Fig. 5: Analysis of 7.1 million solute rejections in the application datasets.

a, Rejection distribution of separable and inseparable molecules for solute concentration with a separable threshold of 0.6 rejections, considering a concentration increase from 1 g l−1 (all membranes, solvents and solutes). The region highlighted in blue covers the majority of separations, with 59% of the total cases. b, Main panel: mass solubility versus predicted rejection for each solvent–solute pair (~1.18 million examples were considered on all membranes). Deeper blue represents higher density regions; the red boxes represent distinct areas, that is, rejections of 0.6 and 0.8 and solubilities of 10 and 100 g l−1. The numbers linked to the boxes denote the respective numbers of examples within the boxes. Side panel: marginal distribution of the log mass solubility of the solvents. c, Product (blue) and impurity (red) separation distributions with the 0.6 rejection threshold. Higher density values represent more solute–membrane–solvent triplets at the corresponding rejection value. d, Density distribution of the rejection selectivity for the products and impurities in the application datasets. The blue area starts at the threshold where nanofiltration is applicable (\({\log }_{10}\varphi > 0.3\)). e, Overall separability (%) of solutes S1–S3 in binary separation by solvent in the application datasets. The chemical substructures represent solutes with the highest average rejection predicted. The number at the base of each bar is the average predicted rejection for solutes featuring the substructure in the given solvent. f, Normal and conservative estimates of the separable molecules (%) for cases where the solubility (c) is higher than 10 g l−1 and the rejection is greater than 0.6, 0.8 or 0.98. PPPs, plant protection products dataset (Supplementary Table 3). Conservative estimates are standard deviation corrected normal values (Supplementary Table 32 and Supplementary Note 10).