Table 1 Foundation models applied to breast cancer whole-slide images

From: Computational pathology in breast cancer: optimizing molecular prediction through task-oriented AI models

Model

Breast applications

Performance (AUC / accuracy)

Endpoints predicted

Prov-GigaPath12

Breast cancer subtyping & mutation prediction

AUC ≥ 0.90 for breast cancer subtyping; pan-cancer biomarker detection: +3.3% AUC improvement

ER, PR, HER2 subtypes; genomic mutations (pan-cancer)

CONCH49

Weakly-supervised classification on TCGA-BRCA slides

~91.4% accuracy (4 slide-level tasks)

ER/PR/HER2 status; Ki-67; morphology

UNI50

Breast cancer classification (e.g., BreakHis); recurrence risk modeling

AUC 0.999 (binary cancer); accuracy 95.5% (8-way); recurrence-risk AUC up to ~0.86

ER/PR; general classification; recurrence risk

Virchow51

Pan-cancer biomarker detection including breast

Sensitivity 95% / Specificity ~72.5% at threshold

ER/PR/HER2; general cancer detection

Phikon / Kaiko52 (TCGA)

Biomarker/mutation prediction on TCGA (including breast)

Similar AUCs to UNI & Prov-GigaPath in mutation tasks (no exact values public)

ESR1, PIK3CA pathway mutations

CTransPath53

Recurrence risk modeling & subtyping

AUC ~ 0.54 for risk (weaker); general ~0.75–0.84 depending on encoder

ER/PR; HER2; recurrence risk

  1. The table summarizes the main clinical endpoints assessed by each model (e.g., hormone receptors, HER2, Ki-67, molecular mutations), the reported performance metrics, and the corresponding peer-reviewed references. To ensure consistency, performance metrics reported in the original studies, most commonly area under the curve (AUC), are included. Both accuracy and AUC, however, have inherent limitations: accuracy is sensitive to class imbalance, while AUC may obscure clinically relevant differences by averaging across thresholds and disregarding model calibration. Only models explicitly applied to breast tissue, or trained on datasets including breast cancer cases, are considered. The listed biomarkers illustrate the capacity of these models to predict clinically meaningful features directly from whole-slide images.