Extended Data Fig. 2: Additional data for classifier performance.

a, Flowchart of all experiments. The area under the receiver operating characteristic curve gives an overall measure of patient-level classifier accuracy as measured in held-out test sets. Flag symbols are from https://twemoji.twitter.com/ (licensed under a CC-BY 4.0 license). b, Classification performance in virtual biopsies. We predicted MSI status in all patients in the DACHS cohort, varying the number of blocks (tiles) from 3 to 2,054, which was the median number of blocks per whole-slide image This experiment was repeated five times with different randomly picked blocks being used. As one block has an edge length of 256 µm, a 1-cm tissue cylinder with 100% tumor tissue from a standard 18G biopsy needle corresponds to 117 blocks and a 16G needle corresponds to 156 blocks. In clinical routine, usually only a part of each biopsy core contains tumor, but multiple biopsy cores are collected. With increasing tissue size, performance stabilizes at AUC = 0.84. This shows that a typical biopsy would be sufficient for MSI prediction. CI, confidence interval. c, Distribution of the numbers of blocks for all patients in DACHS (n = 378 patients). d, Overall survival of patients with genetic MSS tumors stratified by high or low predicted MSIness. In this group, patients with high MSIness had a shorter survival than patients with low MSIness. The table shows the number of patients at risk. The P value was calculated by two-sided log-rank test (n = 350 patients).