Extended Data Fig. 5: ROC curves of the generalization experiments summarized in Fig. 5.
From: Clinical-grade computational pathology using weakly supervised deep learning on whole slide images

a, Prostate model trained with MIL on MSK in-house slides tested on: (1) an in-house slides test set (n = 1,784) digitized on Aperio scanners; (2) an in-house slides test set digitized on a Philips scanner (n = 1,274); and (3) external slides submitted to MSK for consultation (n = 12,727). b,c, Comparison of the proposed MIL approach with state-of-the-art fully supervised learning for breast metastasis detection in lymph nodes. For b, the breast model was trained on MSK data with our proposed method (MIL-RNN) and tested on the MSK breast data test set (n = 1,473) and on the test set of the CAMELYON16 challenge (n = 129), and achieved AUCs of 0.965 and 0.895, respectively. For c, the fully supervised model was trained on CAMELYON16 data and tested on the CAMELYON16 test set (n = 129), achieving an AUC of 0.930. Its performance dropped to AUC = 0.727 when tested on the MSK test set (n = 1,473).