Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Precision Oncology
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj precision oncology
  3. articles
  4. article
A comprehensive foundation model for generalizable cytogenetics in precision oncology with CHROMA
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 27 March 2026

A comprehensive foundation model for generalizable cytogenetics in precision oncology with CHROMA

  • Changchun Yang1,2,3,4 na1,
  • Weiqian Dai1 na1,
  • Yilan Zhang2,3,4 na1,
  • Siyuan Chen2,3,4,
  • Jingdong Hu5,
  • Junkai Su5,
  • Yuxuan Chen5,
  • Ao Xu5,
  • Na Li5,
  • Xin Gao2,3,4 &
  • …
  • Yongguo Yu1 

npj Precision Oncology , Article number:  (2026) Cite this article

  • 795 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Cancer
  • Computational biology and bioinformatics
  • Engineering
  • Mathematics and computing

Abstract

Automated cytogenomic analysis has long been limited by narrow task scope, high annotation demands, and poor robustness to real-world complexity. Here, we introduce CHROMA, the first single-chromosome foundation model for cytogenomics that enables comprehensive, cell-level detection of a wide spectrum of chromosomal abnormalities—including both common and ultra-rare types—in a single, unified framework. Pre-trained on over 4 million chromosomal images from more than 84,000 specimens using self-supervised learning, CHROMA achieves robust and comprehensive detection of numerical and structural abnormalities across diverse classes, dramatically reducing expert annotation workload by 40% through efficient label utilization. The model maintains state-of-the-art accuracy even under highly imbalanced data and challenging imaging conditions, supporting reliable deployment as a risk-aware screening and triage tool, particularly in settings with limited expert availability. An integrated risk-control strategy further ensures safe application by automatically flagging uncertain or rare cases for expert review. By bridging foundational AI advances with real-world clinical needs, CHROMA paves the way for scalable, accessible, and precise cytogenomic analysis in both advanced and underserved healthcare environments.

Similar content being viewed by others

An Open Dataset of Annotated Metaphase Cell Images for Chromosome Identification

Article Open access 23 February 2023

Non-invasive screening in hereditary cancer: a randomized controlled trial to test cell-free DNA-based early detection in the CHARM consortium

Article Open access 29 January 2026

Scrambling the genome in cancer: causes and consequences of complex chromosome rearrangements

Article 08 November 2023

Data availability

The anonymized partial data that support the findings of this study are attached publicly with the trained models. Public datasets in training data for BioImLab (https://www.kaggle.com/datasets/arifmpthesis/bioimlab-chromosome-data-set-for-classification)18, Pki-3 (https://www.fim.uni-passau.de/en/research-and-professorships/former-chairs-professorships/mathematical-stochastics/chromosome-image-data)19, CIR-Net (https://github.com/CloudDataLab/CIR-Net/tree/master/data)20, ChromosomeNet (https://github.com/CloudDataLab/BenchmarkForChromosomeClassification)21, TVG_Hospital (https://www.cellimagelibrary.org/pages/auto_chromosome_detector)22, AutoKary2022 (https://github.com/wangjuncongyu/chromosome-instance-segmentation-dataset?tab=readme-ov-file)23, and CRCN-NE (https://zenodo.org/records/3229434)24 are publicly available from their original publications. The curated and anonymized subsets for numerical, stable, and unstable chromosomal abnormalities, as well as the associated code, are made available together with the trained models. Regarding the large-scale internal dataset used for pretraining: Due to patient privacy concerns and institutional ethics restrictions, this data cannot be made publicly available. However, access to the internal dataset may be granted to researchers upon reasonable request to the corresponding author, subject to the approval of the institutional review board and the signing of a Data Use Agreement (DUA). The authors declare that all other data supporting the findings of this study are available within the paper and its supplementary information files.

Code availability

The source code and the trained models for a working version of CHROMA is available at https://github.com/Changchun-Yang/CHROMA.

References

  1. Pich, O. et al. The translational challenges of precision oncology. Cancer Cell 40, 458–478 (2022).

    Google Scholar 

  2. Watkins, T. B. K. et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature 587, 126–132 (2020).

    Google Scholar 

  3. Savage, J. R. Classification and relationships of induced chromosomal structural changes. J. Med. Genet. 13, 103–122 (1976).

    Google Scholar 

  4. Shamsi, Z. et al. Karyotype AI for precision oncology. Preprint at arXiv:2211.14312 (2022).

  5. Mareschal, S. et al. Challenging conventional karyotyping by next-generation karyotyping in 281 intensively treated patients with AML. Blood Adv. 5, 1003–1016 (2021).

    Google Scholar 

  6. Zhang, N. et al. Global burden of hematologic malignancies and evolution patterns over the past 30 years. Blood Cancer J. 13, 82 (2023).

    Google Scholar 

  7. Awan, U. A. et al. Cytogenetic abnormalities in patients with hematological malignancies in Lahore city, Pakistan. Braz. J. Biol. 83, e249911.

  8. Trejo et al. Chromosomal abnormalities in patients with haematologic malignancies in the General Hospital of Mexico. Rev. Médica del. Hospital Gen. de. México 80, 87–91 (2017).

    Google Scholar 

  9. He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15979–15988 2022.

  10. Uzolas, L. et al. Deep anomaly generation: an image translation approach of synthesizing abnormal banded chromosome images. IEEE Access 10, 59090–59098 (2022).

    Google Scholar 

  11. Kaplan, J. et al. Scaling laws for neural language models. Preprint at arXiv:2001.08361 (2020).

  12. Reis, D. et al. Real-time flying object detection with YOLOv8. Preprint at arXiv:2305.09972 (2023).

  13. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In 34th Conference on Neural Information Processing Systems 6840–6851 (Curran Associates, Inc., 2020).

  14. Mishra, S. et al. A simple, efficient and scalable contrastive masked autoencoder for learning visual representations. Preprint at arXiv:2210.16870 (2022).

  15. Dosovitskiy, A. et al. An image is worth 16 x 16 words: transformers for image recognition at scale. In International Conference on Learning Representations (ICLR, 2021).

  16. Angelopoulos, A. et al. Conformal risk control. In The Twelfth International Conference on Learning Representations (ICLR, 2024).

  17. Angelopoulos, A. N. et al. Prediction-powered inference. Science 382, 669–674 (2023).

    Google Scholar 

  18. Poletti, E. Grisan, E. & Ruggeri, A. Automatic classification of chromosomes in Q-band images. In 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. (IEEE, 2008).

  19. Ritter, G. & Le, G. Automatic segmentation of metaphase cells based on global context and variant analysis. Pattern Recognit. 41, 38–55 (2008).

    Google Scholar 

  20. Lin, C. et al. Cir-net: automatic classification of human chromosome based on inception-resnet architecture. IEEE/ACM Trans. Comput. Biol. Bioinform. 19, 1285–1293 (2020).

    Google Scholar 

  21. Lin, C. et al. ChromosomeNet: a massive dataset enabling benchmarking and building basedlines of clinical chromosome classification. Comput. Biol. Chem. 100, 107731 (2022).

    Google Scholar 

  22. Tseng, J.-J. et al. An open dataset of annotated metaphase cell images for chromosome identification. Sci. Data 10, 104 (2023).

    Google Scholar 

  23. You, D. et al. AutoKary2022: a large-scale densely annotated dataset for chromosome instance segmentation. In 2023 IEEE International Conference on Multimedia and Expo (ICME). (IEEE, 2023).

  24. Andrade, M. F. S. et al. A study of deep learning approaches for classification and detection chromosomes in metaphase images. Mach. Vis. Appl. 31, 65 (2020).

    Google Scholar 

Download references

Author information

Author notes
  1. These authors contributed equally: Changchun Yang, Weiqian Dai, Yilan Zhang.

Authors and Affiliations

  1. Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China

    Changchun Yang, Weiqian Dai & Yongguo Yu

  2. Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia

    Changchun Yang, Yilan Zhang, Siyuan Chen & Xin Gao

  3. Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia

    Changchun Yang, Yilan Zhang, Siyuan Chen & Xin Gao

  4. Center of Excellence on Generative AI, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia

    Changchun Yang, Yilan Zhang, Siyuan Chen & Xin Gao

  5. Smiltec(Suzhou)Co., Ltd, Suzhou, China

    Jingdong Hu, Junkai Su, Yuxuan Chen, Ao Xu & Na Li

Authors
  1. Changchun Yang
    View author publications

    Search author on:PubMed Google Scholar

  2. Weiqian Dai
    View author publications

    Search author on:PubMed Google Scholar

  3. Yilan Zhang
    View author publications

    Search author on:PubMed Google Scholar

  4. Siyuan Chen
    View author publications

    Search author on:PubMed Google Scholar

  5. Jingdong Hu
    View author publications

    Search author on:PubMed Google Scholar

  6. Junkai Su
    View author publications

    Search author on:PubMed Google Scholar

  7. Yuxuan Chen
    View author publications

    Search author on:PubMed Google Scholar

  8. Ao Xu
    View author publications

    Search author on:PubMed Google Scholar

  9. Na Li
    View author publications

    Search author on:PubMed Google Scholar

  10. Xin Gao
    View author publications

    Search author on:PubMed Google Scholar

  11. Yongguo Yu
    View author publications

    Search author on:PubMed Google Scholar

Contributions

C.Y., W.D., Y.Z., and X.G. conceived the study. C.Y., W.D., Y.Z., S.C., and J.H. designed the research. J.S., Y.C., A.X., and N.L. collected and curated the data. C.Y., W.D., and Y.Z. implemented the algorithms and performed data analysis. S.C., J.H., J.S., Y.C., and A.X. contributed to software development, experimental setup, and validation. X.G. and Y.Y. supervised the study. C.Y., W.D., and Y.Z. drafted the manuscript. S.C., J.H., J.S., Y.C., A.X., N.L., X.G., and Y.Y. critically revised the manuscript. All authors discussed the results and approved the final version of the manuscript.

Corresponding authors

Correspondence to Xin Gao or Yongguo Yu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, C., Dai, W., Zhang, Y. et al. A comprehensive foundation model for generalizable cytogenetics in precision oncology with CHROMA. npj Precis. Onc. (2026). https://doi.org/10.1038/s41698-026-01383-4

Download citation

  • Received: 11 August 2025

  • Accepted: 12 March 2026

  • Published: 27 March 2026

  • DOI: https://doi.org/10.1038/s41698-026-01383-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Content types
  • Journal Information
  • Open Access
  • About the Editors
  • Contact
  • Calls for Papers
  • Editorial policies
  • Journal Metrics
  • About the Partner

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Precision Oncology (npj Precis. Onc.)

ISSN 2397-768X (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer