Fig. 3: Weakly supervised classification results on the WSI-level independent test set.

a, d, g Independent test results for kidney cancer, breast cancer, and lung cancer, using only 25% of the training data. b, e, h Independent test results for kidney cancer, breast cancer, and lung cancer using 100% of the training data. c, f, i Two-dimensional visualization of WSI-level feature space in BEPH, with different training data sizes. With the increase of the train dataset, the clusters are tighter, indicating that the true relationships between features are easier to capture and represent. The results in (a, b, d, e, g, and h) are obtained by selecting the best weights based on 10-fold Monte Carlo cross-validation on the validation set (n = 10), and then evaluated on the independent test set, with significance indicated by “*” for p-values < 0.05, “**” for p-values < 0.01, “***” for p-values < 0.001, “****” for p-values < 0.0001 and non-significance by “ns”, calculated using the two-sided Wilcoxon test. The black solid line indicates the median and the red dashed line represents the mean. The whiskers extend from the box to the smallest and largest values within 1.5 times the IQR, while points outside the whiskers are considered outliers. j Tolerance of different pre-training models to data reduction. The x-axis adopts a dual-scale design, combining the percentage of samples (primary) and the training data size (secondary). These results demonstrate that the MIM pre-training model can achieve good performance after training on a limited number of labeled slides, surpassing other weakly supervised baselines. All data are presented as mean values ± SD. Source data are provided with this paper.