Fig. 4: Distribution of SBS18/SBS36, somatic single nucleotide variants (SNVs), and tumor mutational signature (TMS) reconstruction error across CRCs from training, validation, and test sets.

a The CRCs from the biallelic MUTYH pathogenic variant carriers cluster together based on high SBS18/SBS36 TMS and low TMS reconstruction error highlighting the need to include TMS reconstruction error in classifier, and b CRCs with greater than 95% likelihood of arising from biallelic MUTYH pathogenic variants based on TMS. The number of SNV mutations used in determining TMS (horizontal axis) and the TMS reconstruction error (vertical axis) demonstrates the importance of low reconstruction error (<39%) and sufficient somatic mutation count (≥9) for correctly classifying tumors from biallelic MUTYH pathogenic variant carriers (true positives). Source data are provided as a Source Data file.