Extended Data Fig. 8: Comparison to supervised deep learning models. | Nature Medicine

Extended Data Fig. 8: Comparison to supervised deep learning models.

From: A visual–language foundation model for pathology image analysis using medical Twitter

Extended Data Fig. 8

The fine-tuning was conducted on a, Kather colon dataset training split, b, PanNuke dataset, c, DigestPath dataset, and d, WSSS4LUAD dataset, by comparing the PLIP image encoder to ViT-B/32 (pre-trained on ImageNet). In the line plots, mean values and 95% confidence intervals are presented by using 10 different random seeds for subsetting the data and running the models. The improvements for PLIP are particularly large for smaller datasets. For instance, when comparing the weighted F1 scores across the four datasets using only 1% of the training data: (i) for Kather training split, the PLIP image encoder achieved F1 = 0.952, while ViT-B/32 achieved F1 = 0.921; (ii) for PanNuke dataset, the PLIP image encoder achieved F1 = 0.715, while ViT-B/32 achieved F1 = 0.637; (iii) for DigestPath dataset, the PLIP image encoder achieved F1 = 0.933, while ViT-B/32 achieved F1 = 0.872; (iv) for WSSS4LUAD dataset, the PLIP image encoder achieved F1 = 0.816, while ViT-B/32 achieved F1 = 0.645. When comparing the weighted F1 scores across the four datasets using all of the training data: (i) for Kather training split, the PLIP image encoder achieved F1 = 0.994, while ViT-B/32 achieved F1 = 0.991; (ii) for PanNuke dataset, the PLIP image encoder achieved F1 = 0.962, while ViT-B/32 achieved F1 = 0.938; (iii) for DigestPath dataset, the PLIP image encoder achieved F1 = 0.977, while ViT-B/32 achieved F1 = 0.968; (iv) for WSSS4LUAD dataset, the PLIP image encoder achieved F1 = 0.958, while ViT-B/32 achieved F1 = 0.941.

Back to article page