Extended Data Fig. 6: Increasing numbers of patients improves accuracy. | Nature Medicine

Extended Data Fig. 6: Increasing numbers of patients improves accuracy.

From: Genomic copy number predicts esophageal cancer years before transformation

Extended Data Fig. 6

Analysis showing the potential for improvement by training the model with increasing numbers of patients (x-axis) from the discovery cohort (green and orange bars), combining the discovery and validation cohorts (dark purple bar), and combining all sWGS (discovery and validation) data with the SNP data from the Seattle BE Study (pink bars). In each model we assessed the a, cross-validation accuracy, the b, number of coefficients selected by the model, and finally the c, AUC for a leave-one-out analysis. The green bars are all increasing numbers of patients used in training a model from the discovery cohort (error bars are the mean ± s.e.m. from repeating each training 10 times with randomly selected patients), the orange bar represents the full discovery cohort, the purple bar is the combined discovery and validation (n = 164) cohorts, and the pink bars are the combined sWGS and SNP patients (n = 413).The discovery and validation (all sWGS data) displays consistent improvement in accuracy (0.57 to 0.75) and AUC (0.7 to 0.89) as the number of patients increases. Including the SNP data results in no improvement despite the increased number of patients indicating that the sWGS data alone provides more accurate prognostic information. d, Shows the classification rate per-sample across all 164 patients in the discovery and validation cohorts when we use a model trained on all samples (n = 986). An overall improvement in accuracy for both high and low risk patients is observed.

Source data

Back to article page