Fig. 6: Machine learning to refine the detection of HRD using SBS mutations.

All models were trained in balanced training sets. Training sets were randomly drawn and included 70% of the HR-deficient (class H1a*) tumors and an equal number of HR-proficient tumors (class H3). The remaining tumors were assigned to the test sets. A Overview on the data sets and the workflow of cross-validation, model training, and model testing. Two series of models were evaluated: One trained in ovarian cancer and another one trained across cancer types. B Performance of the models trained in ovarian cancer. C Performance of the models trained in the pan-cancer cohort. TCGA-PANCAN = TCGA pan-cancer cohort including more than 10.000 tumors, TCGA-PANCAN* = TCGA-PANCAN excluding TCGA-BRCA and -OV. HD-OV = in-house ovarian cancer cohort. Mts = mutation types.