Extended Data Fig. 6: High-performance vs. low-performance tasks.
From: Benchmarking foundation models as feature extractors for weakly supervised computational pathology

A, Average AUROC scores across 15 high-performance and 16 low-performance tasks. Tasks were selected by including only those where at least one foundation model achieved an average AUROC over 0.75 and all others in low-performance tasks. B-C, The performance of each foundation model is listed. The final row presents the overall average AUROC for each model. Tasks are sorted by their mean AUROC across all models, while models are sorted by their mean AUROC across all tasks.