Fig. 1: Dataset overview.
From: A generalizable machine learning framework for classifying DNA repair defects using ctDNA exomes

a Graphical abstract depicting development of a machine learning classifier for identifying DNA damage repair defects in metastatic prostate and bladder cancers. Pre-evaluated clinical ctDNA samples from multiple sources were selected for whole-exome sequencing and used to train interpretable XGBoost models. b Oncoprint showing assigned DNA damage repair labels (assigned from prior deep targeted sequencing) and selected somatic features of the whole-exome sequencing cohort including signature weights and mutation counts. Log depth ratios (LDR) are the normalised average from targeted sequencing. c–f Comparison between naïve somatic features of all samples assigned to each DNA damage repair label, including the number of single nucleotide variants (SNVs) and indels, the proportion of the genome affected by a copy number variant (CNV) relative to base ploidy, and the overall genome ploidy. P-values are from Mann–Whitney U tests. For the box and whisker plots, the box encompasses the interquartile range, the midpoint of the box represents the median, and the whiskers extend 1.5× beyond the interquartile range. †Note that the mismatch repair defective (MMRd) label reflects integrated results from ctDNA gene panel sequencing, whole-exome sequencing, intron sequencing as well as immunohistochemical staining of MSH2, MSH6, MLH1, and PMS2 in archival primary tissue.