Fig. 1: Framework of whole-exome sequencing data-oriented cancer extrachromosomal DNA amplification identification, evaluation, analysis, and application. | Nature Communications

Fig. 1: Framework of whole-exome sequencing data-oriented cancer extrachromosomal DNA amplification identification, evaluation, analysis, and application.

From: Machine learning-based extrachromosomal DNA identification in large-scale cohorts reveals its clinical implications in cancer

Fig. 1

a Schematic diagram of the study. b Features and their importance of final constructed XGBOOST model for ecDNA cargo gene prediction. XGBOOST modeling with 11 features was repeated 1000 times independently to determine the final hyperparameters. c Performance estimation (auPRC, area under precision-recall curve; data are presented as mean +/– SD) for final ecDNA cargo gene prediction model under training and evaluation processes with stratified group k-fold cross-validation (k is 10 here). The dotted line indicates the stop iteration by early stopping approach (the performance does not improve for 10 rounds afterwards). Tumor sample size n = 386. d Performance scores auPRC, auROC (area under receiver operating characteristic curve), precision, sensitivity, and specificity of sample level ecDNA amplification identification. Source data are provided as a Source Data file. XGBOOST, eXtreme Gradient Boosting. total_cn, total copy number. minor_cn, copy number of minor allele. cna_burden, copy number alteration burden. pLOH genome percentage with loss of heterozygosity. AScore aneuploidy score.

Back to article page