Extended Data Fig. 1: Performance (AUPR) in downstream tasks.
From: A data-efficient strategy for building high-performing medical foundation models

a, Internal evaluation. We fine-tuned the pretrained models on nine public datasets across four downstream tasks: diabetic retinopathy grading, glaucoma diagnosis, age-related macular degeneration (AMD) grading and multi diseases classification. Compared to RETFound, RETFound-DE achieves superior performance on six datasets (P < 0.05) and comparable performance on the other three datasets (P > 0.05). b, External evaluation. Models are fine-tuned on one diabetic retinopathy grading dataset and evaluated on the others. RETFound-DE outperforms RETFound when fine-tuned on APTOS-2019 and evaluated on IDRID, or when fine-tuned on IDRID and evaluated on MESSIDOR-2. We present the mean value of AUPR on each bar and the error bars show 95% confidence intervals. P-value was calculated with the two-sided t-test and listed in the figure.