Abstract
The quantification of circulating tumour DNA (ctDNA) in blood enables non-invasive surveillance of cancer progression. Here we show that a deep-learning model can accurately quantify ctDNA from the density distribution of cell-free DNA-fragment lengths. We validated the model, which we named ‘Fragle’, by using low-pass whole-genome-sequencing data from multiple cancer types and healthy control cohorts. In independent cohorts, Fragle outperformed tumour-naive methods, achieving higher accuracy and lower detection limits. We also show that Fragle is compatible with targeted sequencing data. In plasma samples from patients with colorectal cancer, longitudinal analysis with Fragle revealed strong concordance between ctDNA dynamics and treatment responses. In patients with resected lung cancer, Fragle outperformed a tumour-naive gene panel in the prediction of minimal residual disease for risk stratification. The method’s versatility, speed and accuracy for ctDNA quantification suggest that it may have broad clinical utility.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
The published data used in this study and their access codes are provided in Supplementary Dataset 1. Data generated in this study are available from the European Genome-Phenome Archive (EGA; dataset ID EGAD50000000167). The data are available under restricted access and will be released subject to a data-transfer agreement. Source data are provided with this paper.
Code availability
The Fragle software is available at https://github.com/skandlab/FRAGLE. The software can be directly applied to lpWGS/off-target BAM files aligned to hg19/GRCh37/hg38 reference genomes without any preprocessing.
References
Lui, Y. Y. et al. Predominant hematopoietic origin of cell-free DNA in plasma and serum after sex-mismatched bone marrow transplantation. Clin. Chem. 48, 421–427 (2002).
Pantel, K. & Alix-Panabières, C. Liquid biopsy and minimal residual disease—latest advances and implications for cure. Nat. Rev. Clin. Oncol. 16, 409–424 (2019).
Sanz-Garcia, E., Zhao, E., Bratman, S. V. & Siu, L. L. Monitoring and adapting cancer treatment using circulating tumor DNA kinetics: current research, opportunities, and challenges. Sci. Adv. 8, eabi8618 (2022).
Kilgour, E., Rothwell, D. G., Brady, G. & Dive, C. Liquid biopsy-based biomarkers of treatment response and resistance. Cancer Cell 37, 485–495 (2020).
Vasan, N., Baselga, J. & Hyman, D. M. A view on drug resistance in cancer. Nature 575, 299–309 (2019).
Razavi, P. et al. High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants. Nat. Med. 25, 1928–1937 (2019).
Adalsteinsson, V. A. et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat. Commun. 8, 1324 (2017).
Li, S. et al. Comprehensive tissue deconvolution of cell-free DNA by deep learning for disease diagnosis and monitoring. Proc. Natl Acad. Sci. USA 120, e2305236120 (2023).
Li, W. et al. CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic Acids Res. 46, e89 (2018).
Zhu, G. et al. Tissue-specific cell-free DNA degradation quantifies circulating tumor DNA burden. Nat. Commun. 12, 2229 (2021).
Lo, Y. M. D. et al. Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci. Transl. Med. 2, 61ra91 (2010).
Jiang, P. et al. Lengthening and shortening of plasma DNA in hepatocellular carcinoma patients. Proc. Natl Acad. Sci. USA 112, E1317–E1325 (2015).
Underhill, H. R. et al. Fragment length of circulating tumor DNA. PLoS Genet. 12, e1006162 (2016).
Mouliere, F. et al. High fragmentation characterizes tumour-derived circulating DNA. PLoS ONE 6, e23418 (2011).
Nguyen, T. H. et al. Multimodal analysis of methylomics and fragmentomics in plasma cell-free DNA for multi-cancer early detection and localization. eLife 12, RP89083 (2023).
Mouliere, F. et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci. Transl. Med. 10, eaat4921 (2018).
Cristiano, S. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570, 385–389 (2019).
Foda, Z. H. et al. Detecting liver cancer using cell-free DNA fragmentomes. Cancer Discov. 13, 616–631 (2023).
Renaud, G. et al. Unsupervised detection of fragment length signatures of circulating tumor DNA using non-negative matrix factorization. eLife 11, e71569 (2022).
Yu, S. C., Choy, L. L. & Lo, Y. D. ‘Longing’ for the next generation of liquid biopsy: the diagnostic potential of long cell-free DNA in oncology and prenatal testing. Mol. Diagn. Ther. 27, 563–571 (2023).
Hudecova, I. et al. Characteristics, origin, and potential for cancer diagnostics of ultrashort plasma cell-free DNA. Genome Res. 32, 215–227 (2022).
Mathios, D. et al. Detection and characterization of lung cancer using cell-free DNA fragmentomes. Nat. Commun. 12, 5060 (2021).
Esfahani, M. S. et al. Inferring gene expression from cell-free DNA fragmentation profiles. Nat. Biotechnol. 40, 585–597 (2022).
Ptashkin, R. N. et al. Prevalence of clonal hematopoiesis mutations in tumor-only clinical genomic profiling of solid tumors. JAMA Oncol. 4, 1589–1593 (2018).
Woodhouse, R. et al. Clinical and analytical validation of FoundationOne Liquid CDx, a novel 324-Gene cfDNA-based comprehensive genomic profiling assay for cancers of solid tumor origin. PLoS ONE 15, e0237802 (2020).
Audinot, B. et al. ctDNA quantification improves estimation of outcomes in patients with high grade osteosarcoma: a translational study from the OS2006 trial. Ann. Oncol. 35, 559–568 (2024).
Bratman, S. V. et al. Personalized circulating tumor DNA analysis as a predictive biomarker in solid tumor patients treated with pembrolizumab. Nat. Cancer 1, 873–881 (2020).
Chen, K. et al. Individualized tumor-informed circulating tumor DNA analysis for postoperative monitoring of non-small cell lung cancer. Cancer Cell 41, 1749–1762. e1746 (2023).
Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput. Biol. 12, e1004873 (2016).
Zviran, A. et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat. Med. 26, 1114–1124 (2020).
Tsui, D. W. et al. Tumor fraction-guided cell-free DNA profiling in metastatic solid tumor patients. Genome Med. 13, 1–15 (2021).
Jiang, P. et al. Plasma DNA end-motif profiling as a fragmentomic marker in cancer, pregnancy, and transplantation. Cancer Discov. 10, 664–673 (2020).
Yu, P. et al. Multi-dimensional cell-free DNA-based liquid biopsy for sensitive early detection of gastric cancer. Genome Med. 16, 79 (2024).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Bao, L., Pu, M. & Messer, K. AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data. Bioinformatics 30, 1056–1063 (2014).
Ha, G. et al. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res. 24, 1881–1893 (2014).
Larson, N. B. & Fridley, B. L. PurBayes: estimating tumor cellularity and subclonality in next-generation sequencing data. Bioinformatics 29, 1888–1889 (2013).
Oesper, L., Satas, G. & Raphael, B. J. Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data. Bioinformatics 30, 3532–3540 (2014).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning 448–456 (PMLR, 2015).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition 770–778 (CVPR, 2026).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Lai, Z. et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, e108 (2016).
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 1–14 (2016).
Mansukhani, S. et al. Ultra-sensitive mutation detection and genome-wide DNA copy number reconstruction by error-corrected circulating tumor DNA sequencing. Clin. Chem. 64, 1626–1635 (2018).
Kleftogiannis, D. et al. Detection of genomic alterations in breast cancer with circulating tumour DNA sequencing. Sci. Rep. 10, 16774 (2020).
Chen, K. et al. Individualized dynamic methylation-based analysis of cell-free DNA in postoperative monitoring of lung cancer. BMC Med. 21, 255 (2023).
Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).
Acknowledgements
This work was supported by the Agency for Science, Technology and Research under its IAF-PP programme (grant ID H1801a0019) and the Singapore Ministry of Health’s National Medical Research Council under its OF-IRG programme (OFIRG21nov-0083). This work makes use of data from the Chinese University of Hong Kong (CUHK) Circulating Nucleic Acids Research Group as reported previously (https://doi.org/10.1073/pnas.1500076112 and https://doi.org/10.1158/2159-8290.CD-19-0622) and data from CRUK_CI, University of Cambridge, Rosenfeld Lab, as reported previously (https://doi.org/10.1126/scitranslmed.aat4921). We gratefully acknowledge D. Lo and his research group at CUHK, as well as N. Rosenfeld and his research group at University of Cambridge, for providing access to cfDNA cohorts.
Author information
Authors and Affiliations
Contributions
A.J.S. supervised the project. A.J.S. and G.Z. conceived the project. A.J.S., G.Z. and C.R.R. performed most of data analysis and model development. V.G., D.O., P.B., H.C., A.J.L., Y.A.G., Z.W.P., N.L.S., A.A., Y.C., L.N.L., D.H., S.T. and L.W. assisted in data analysis. V.G., P.B. and A.A. assisted in model development. P.P., Y.T.L., A.G. and S.N. performed the experiments. S.-L.K., D.Q.C., B.T., T.J.T., Y.S.Y., A.Y.C., M.C.H.N., P.T., D.T., P.M.W. and I.B.T. provided samples and clinical information. A.J.S., G.Z. and C.R.R. wrote the paper. All authors reviewed and approved the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks Le Son Tran and Zhihong Zhang for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1
Schematic showing how the predictive models were trained and validated using the healthy control samples, original cancer samples, and in silico cancer spike-ins in the discovery cohort.
Extended Data Fig. 2
Comparison between Fragle and ichorCNA in their predicted ctDNA fractions in unseen test samples from patients with lung, nasopharyngeal, as well as head and neck cancers.
Extended Data Fig. 3
Comparison of the Fragle-predicted ctDNA fractions in the unseen cfDNA WGS samples with 10 million cfDNA fragments and their downsampled counterparts.
Supplementary information
Supplementary Information
Supplementary figures and list of supplementary datasets.
Supplementary Dataset 1
Information on the discovery and unseen cohorts in this study.
Supplementary Dataset 2
ctDNA fractions of cfDNA samples from patients with cancer in the discovery cohort.
Supplementary Dataset 3
The in silico samples of various ctDNA contents in the discovery cohort.
Supplementary Dataset 4
Pearson and Spearman correlations of the ctDNA fractions predicted by Fragle and other methods.
Supplementary Dataset 5
Variant allele frequency estimation of cfDNA samples from patients with CRC in the unseen cohort.
Supplementary Dataset 6
Specificity at different LoD cut-offs across datasets.
Supplementary Dataset 7
Comparison between Fragle and ichorCNA under different LoD cut-offs using unseen healthy and cancer samples.
Supplementary Dataset 8
Mutations missed by the callers for two patients with CRC.
Supplementary Dataset 9
Clinical information for the patients recruited for this study.
Supplementary Dataset 10
Fragment length intervals that differed between cancer samples and healthy individuals.
Supplementary Dataset 11
Summary statistics for the gene panels.
Supplementary Dataset 12
List of gene names in the panels.
Supplementary Dataset 13
Genomic locations corresponding to the gene panels.
Source data
Source Data Figs. 2 and 3 and Extended Data Fig. 2
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, G., Rahman, C.R., Getty, V. et al. A deep-learning model for quantifying circulating tumour DNA from the density distribution of DNA-fragment lengths. Nat. Biomed. Eng 9, 307–319 (2025). https://doi.org/10.1038/s41551-025-01370-3
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41551-025-01370-3