Fig. 4: HistoGPT accurately predicts diseases in-domain and out-of-domain without human guidance.
From: Generating dermatopathology reports from gigapixel whole slide images with HistoGPT

a In the absence of a human-in-the-loop, HistoGPT predicts the patient’s diagnosis on its own and generates the corresponding pathology report. b On the Munich test set, HistoGPT was on par with state-of-the-art classification models in predicting over 100 dermatological diseases, even though the model’s output is pure text. c HistoGPT discriminated malignant from benign conditions with high accuracy on the Munich dataset: basal cell carcinoma (BCC, n = 107) vs. other conditions (n = 621) with an accuracy of 0.98 and a weighted F1 score of 0.98; actinic keratosis (AK, n = 47) vs. squamous cell carcinoma (SCC, n = 33) with an accuracy of 0.88 and a weighted F1 score of 0.87; benign melanocytic nevus (BMN, n = 86) vs. melanoma (n = 21) with an accuracy of 0.89 and a weighted F1 score of 0.89. d We evaluated HistoGPT in 5 independent external cohorts (Münster-3H, TCGA-SKCM, CPTAC-CM, Queensland, Linköping) covering different countries, scanner types, staining techniques, and biopsy methods. e HistoGPT performed equal to or better than state-of-the-art MIL on external datasets, especially when using self-prompting (“Classifier Guidance”). The box plots show the quantiles as a black line and the mean as an inner circle obtained from 1000 bootstraps. The minimum and maximum values are shown as white circles at the top and bottom. f HistoGPT was able to produce highly accurate pathology reports, as indicated by the high keyword and cosine-based similarity scores for Münster-1K. As in Fig. 3C, the lower baseline compares two randomly selected reports. Source data are provided as a Source Data file.