Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A deep-learning model for quantifying circulating tumour DNA from the density distribution of DNA-fragment lengths

Abstract

The quantification of circulating tumour DNA (ctDNA) in blood enables non-invasive surveillance of cancer progression. Here we show that a deep-learning model can accurately quantify ctDNA from the density distribution of cell-free DNA-fragment lengths. We validated the model, which we named ‘Fragle’, by using low-pass whole-genome-sequencing data from multiple cancer types and healthy control cohorts. In independent cohorts, Fragle outperformed tumour-naive methods, achieving higher accuracy and lower detection limits. We also show that Fragle is compatible with targeted sequencing data. In plasma samples from patients with colorectal cancer, longitudinal analysis with Fragle revealed strong concordance between ctDNA dynamics and treatment responses. In patients with resected lung cancer, Fragle outperformed a tumour-naive gene panel in the prediction of minimal residual disease for risk stratification. The method’s versatility, speed and accuracy for ctDNA quantification suggest that it may have broad clinical utility.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of Fragle.
Fig. 2: ctDNA quantification in validation and unseen cohorts.
Fig. 3: Lower LoD.
Fig. 4: Application of Fragle to targeted sequencing data.
Fig. 5: Monitoring of ctDNA levels and disease progression from targeted sequencing.
Fig. 6: Risk stratification of patients with early-stage lung cancer.

Similar content being viewed by others

Data availability

The published data used in this study and their access codes are provided in Supplementary Dataset 1. Data generated in this study are available from the European Genome-Phenome Archive (EGA; dataset ID EGAD50000000167). The data are available under restricted access and will be released subject to a data-transfer agreement. Source data are provided with this paper.

Code availability

The Fragle software is available at https://github.com/skandlab/FRAGLE. The software can be directly applied to lpWGS/off-target BAM files aligned to hg19/GRCh37/hg38 reference genomes without any preprocessing.

References

  1. Lui, Y. Y. et al. Predominant hematopoietic origin of cell-free DNA in plasma and serum after sex-mismatched bone marrow transplantation. Clin. Chem. 48, 421–427 (2002).

    Article  CAS  PubMed  Google Scholar 

  2. Pantel, K. & Alix-Panabières, C. Liquid biopsy and minimal residual disease—latest advances and implications for cure. Nat. Rev. Clin. Oncol. 16, 409–424 (2019).

    Article  CAS  PubMed  Google Scholar 

  3. Sanz-Garcia, E., Zhao, E., Bratman, S. V. & Siu, L. L. Monitoring and adapting cancer treatment using circulating tumor DNA kinetics: current research, opportunities, and challenges. Sci. Adv. 8, eabi8618 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Kilgour, E., Rothwell, D. G., Brady, G. & Dive, C. Liquid biopsy-based biomarkers of treatment response and resistance. Cancer Cell 37, 485–495 (2020).

    Article  CAS  PubMed  Google Scholar 

  5. Vasan, N., Baselga, J. & Hyman, D. M. A view on drug resistance in cancer. Nature 575, 299–309 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Razavi, P. et al. High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants. Nat. Med. 25, 1928–1937 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Adalsteinsson, V. A. et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat. Commun. 8, 1324 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Li, S. et al. Comprehensive tissue deconvolution of cell-free DNA by deep learning for disease diagnosis and monitoring. Proc. Natl Acad. Sci. USA 120, e2305236120 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Li, W. et al. CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic Acids Res. 46, e89 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Zhu, G. et al. Tissue-specific cell-free DNA degradation quantifies circulating tumor DNA burden. Nat. Commun. 12, 2229 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Lo, Y. M. D. et al. Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci. Transl. Med. 2, 61ra91 (2010).

    Article  CAS  PubMed  Google Scholar 

  12. Jiang, P. et al. Lengthening and shortening of plasma DNA in hepatocellular carcinoma patients. Proc. Natl Acad. Sci. USA 112, E1317–E1325 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Underhill, H. R. et al. Fragment length of circulating tumor DNA. PLoS Genet. 12, e1006162 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Mouliere, F. et al. High fragmentation characterizes tumour-derived circulating DNA. PLoS ONE 6, e23418 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Nguyen, T. H. et al. Multimodal analysis of methylomics and fragmentomics in plasma cell-free DNA for multi-cancer early detection and localization. eLife 12, RP89083 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Mouliere, F. et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci. Transl. Med. 10, eaat4921 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Cristiano, S. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570, 385–389 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Foda, Z. H. et al. Detecting liver cancer using cell-free DNA fragmentomes. Cancer Discov. 13, 616–631 (2023).

    Article  CAS  PubMed  Google Scholar 

  19. Renaud, G. et al. Unsupervised detection of fragment length signatures of circulating tumor DNA using non-negative matrix factorization. eLife 11, e71569 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Yu, S. C., Choy, L. L. & Lo, Y. D. ‘Longing’ for the next generation of liquid biopsy: the diagnostic potential of long cell-free DNA in oncology and prenatal testing. Mol. Diagn. Ther. 27, 563–571 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Hudecova, I. et al. Characteristics, origin, and potential for cancer diagnostics of ultrashort plasma cell-free DNA. Genome Res. 32, 215–227 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Mathios, D. et al. Detection and characterization of lung cancer using cell-free DNA fragmentomes. Nat. Commun. 12, 5060 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Esfahani, M. S. et al. Inferring gene expression from cell-free DNA fragmentation profiles. Nat. Biotechnol. 40, 585–597 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Ptashkin, R. N. et al. Prevalence of clonal hematopoiesis mutations in tumor-only clinical genomic profiling of solid tumors. JAMA Oncol. 4, 1589–1593 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Woodhouse, R. et al. Clinical and analytical validation of FoundationOne Liquid CDx, a novel 324-Gene cfDNA-based comprehensive genomic profiling assay for cancers of solid tumor origin. PLoS ONE 15, e0237802 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Audinot, B. et al. ctDNA quantification improves estimation of outcomes in patients with high grade osteosarcoma: a translational study from the OS2006 trial. Ann. Oncol. 35, 559–568 (2024).

    Article  CAS  PubMed  Google Scholar 

  27. Bratman, S. V. et al. Personalized circulating tumor DNA analysis as a predictive biomarker in solid tumor patients treated with pembrolizumab. Nat. Cancer 1, 873–881 (2020).

    Article  CAS  PubMed  Google Scholar 

  28. Chen, K. et al. Individualized tumor-informed circulating tumor DNA analysis for postoperative monitoring of non-small cell lung cancer. Cancer Cell 41, 1749–1762. e1746 (2023).

    Article  CAS  PubMed  Google Scholar 

  29. Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput. Biol. 12, e1004873 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Zviran, A. et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat. Med. 26, 1114–1124 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Tsui, D. W. et al. Tumor fraction-guided cell-free DNA profiling in metastatic solid tumor patients. Genome Med. 13, 1–15 (2021).

    Article  Google Scholar 

  32. Jiang, P. et al. Plasma DNA end-motif profiling as a fragmentomic marker in cancer, pregnancy, and transplantation. Cancer Discov. 10, 664–673 (2020).

    Article  CAS  PubMed  Google Scholar 

  33. Yu, P. et al. Multi-dimensional cell-free DNA-based liquid biopsy for sensitive early detection of gastric cancer. Genome Med. 16, 79 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  35. Bao, L., Pu, M. & Messer, K. AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data. Bioinformatics 30, 1056–1063 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Ha, G. et al. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res. 24, 1881–1893 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Larson, N. B. & Fridley, B. L. PurBayes: estimating tumor cellularity and subclonality in next-generation sequencing data. Bioinformatics 29, 1888–1889 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Oesper, L., Satas, G. & Raphael, B. J. Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data. Bioinformatics 30, 3532–3540 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning 448–456 (PMLR, 2015).

  41. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition 770–778 (CVPR, 2026).

  42. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    Google Scholar 

  43. Lai, Z. et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, e108 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  44. McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 1–14 (2016).

    Article  Google Scholar 

  45. Mansukhani, S. et al. Ultra-sensitive mutation detection and genome-wide DNA copy number reconstruction by error-corrected circulating tumor DNA sequencing. Clin. Chem. 64, 1626–1635 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Kleftogiannis, D. et al. Detection of genomic alterations in breast cancer with circulating tumour DNA sequencing. Sci. Rep. 10, 16774 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Chen, K. et al. Individualized dynamic methylation-based analysis of cell-free DNA in postoperative monitoring of lung cancer. BMC Med. 21, 255 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the Agency for Science, Technology and Research under its IAF-PP programme (grant ID H1801a0019) and the Singapore Ministry of Health’s National Medical Research Council under its OF-IRG programme (OFIRG21nov-0083). This work makes use of data from the Chinese University of Hong Kong (CUHK) Circulating Nucleic Acids Research Group as reported previously (https://doi.org/10.1073/pnas.1500076112 and https://doi.org/10.1158/2159-8290.CD-19-0622) and data from CRUK_CI, University of Cambridge, Rosenfeld Lab, as reported previously (https://doi.org/10.1126/scitranslmed.aat4921). We gratefully acknowledge D. Lo and his research group at CUHK, as well as N. Rosenfeld and his research group at University of Cambridge, for providing access to cfDNA cohorts.

Author information

Authors and Affiliations

Authors

Contributions

A.J.S. supervised the project. A.J.S. and G.Z. conceived the project. A.J.S., G.Z. and C.R.R. performed most of data analysis and model development. V.G., D.O., P.B., H.C., A.J.L., Y.A.G., Z.W.P., N.L.S., A.A., Y.C., L.N.L., D.H., S.T. and L.W. assisted in data analysis. V.G., P.B. and A.A. assisted in model development. P.P., Y.T.L., A.G. and S.N. performed the experiments. S.-L.K., D.Q.C., B.T., T.J.T., Y.S.Y., A.Y.C., M.C.H.N., P.T., D.T., P.M.W. and I.B.T. provided samples and clinical information. A.J.S., G.Z. and C.R.R. wrote the paper. All authors reviewed and approved the paper.

Corresponding author

Correspondence to Anders Jacobsen Skanderup.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Le Son Tran and Zhihong Zhang for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1

Schematic showing how the predictive models were trained and validated using the healthy control samples, original cancer samples, and in silico cancer spike-ins in the discovery cohort.

Extended Data Fig. 2

Comparison between Fragle and ichorCNA in their predicted ctDNA fractions in unseen test samples from patients with lung, nasopharyngeal, as well as head and neck cancers.

Source data

Extended Data Fig. 3

Comparison of the Fragle-predicted ctDNA fractions in the unseen cfDNA WGS samples with 10 million cfDNA fragments and their downsampled counterparts.

Source data

Supplementary information

Supplementary Information

Supplementary figures and list of supplementary datasets.

Reporting Summary

Peer Review File

Supplementary Dataset 1

Information on the discovery and unseen cohorts in this study.

Supplementary Dataset 2

ctDNA fractions of cfDNA samples from patients with cancer in the discovery cohort.

Supplementary Dataset 3

The in silico samples of various ctDNA contents in the discovery cohort.

Supplementary Dataset 4

Pearson and Spearman correlations of the ctDNA fractions predicted by Fragle and other methods.

Supplementary Dataset 5

Variant allele frequency estimation of cfDNA samples from patients with CRC in the unseen cohort.

Supplementary Dataset 6

Specificity at different LoD cut-offs across datasets.

Supplementary Dataset 7

Comparison between Fragle and ichorCNA under different LoD cut-offs using unseen healthy and cancer samples.

Supplementary Dataset 8

Mutations missed by the callers for two patients with CRC.

Supplementary Dataset 9

Clinical information for the patients recruited for this study.

Supplementary Dataset 10

Fragment length intervals that differed between cancer samples and healthy individuals.

Supplementary Dataset 11

Summary statistics for the gene panels.

Supplementary Dataset 12

List of gene names in the panels.

Supplementary Dataset 13

Genomic locations corresponding to the gene panels.

Source data

Source Data Figs. 2 and 3 and Extended Data Fig. 2

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, G., Rahman, C.R., Getty, V. et al. A deep-learning model for quantifying circulating tumour DNA from the density distribution of DNA-fragment lengths. Nat. Biomed. Eng 9, 307–319 (2025). https://doi.org/10.1038/s41551-025-01370-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41551-025-01370-3

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer