Fig. 5: Application of XGBoost machine learning to identify cancer-specific patterns in cfDNA. | Communications Biology

Fig. 5: Application of XGBoost machine learning to identify cancer-specific patterns in cfDNA.

From: Open chromatin-guided interpretable machine learning reveals cancer-specific chromatin features in cell-free DNA

Fig. 5

a Model performance with 2804 selected open chromatin loci. Bar graphs indicate the accuracy, validation scores, and AUC for models using breast cancer differential peaks versus randomly selected loci. Each score is defined in the Methods section. To better accommodate potential impacts of chromosome amplification, each locus was expanded to 10 kb. b Performance comparison across multiple models. Genome coverage data from the whole-genome (in 10 kb bins) and 10 kb expanded ATAC-seq peaks were used for XGBoost modeling. Model 1 utilizes whole-genome coverage at 10 kb resolution; Model 2 uses 10 kb expanded peaks from T47D ATAC-seq; Model 3 employs 10 kb expanded peaks from CD4+ T cell ATAC-seq; Model 4 combines the 10 kb expanded peaks from both T47D and CD4+ T cell ATAC-seq. c Top significant features extracted from the best XGBoost model (Model 4). An example of XGBoost decision trees is also shown. d Distribution of significant features relative to TSS. e Kaplan–Meier patient survival analysis based on EMP3 expression. f Scatter plot showing the correlation of EMP3 and S100A4 expression levels in breast cancer. METABRIC50 cohort was used.

Back to article page