Abstract
Automated cytogenomic analysis has long been limited by narrow task scope, high annotation demands, and poor robustness to real-world complexity. Here, we introduce CHROMA, the first single-chromosome foundation model for cytogenomics that enables comprehensive, cell-level detection of a wide spectrum of chromosomal abnormalities—including both common and ultra-rare types—in a single, unified framework. Pre-trained on over 4 million chromosomal images from more than 84,000 specimens using self-supervised learning, CHROMA achieves robust and comprehensive detection of numerical and structural abnormalities across diverse classes, dramatically reducing expert annotation workload by 40% through efficient label utilization. The model maintains state-of-the-art accuracy even under highly imbalanced data and challenging imaging conditions, supporting reliable deployment as a risk-aware screening and triage tool, particularly in settings with limited expert availability. An integrated risk-control strategy further ensures safe application by automatically flagging uncertain or rare cases for expert review. By bridging foundational AI advances with real-world clinical needs, CHROMA paves the way for scalable, accessible, and precise cytogenomic analysis in both advanced and underserved healthcare environments.
Similar content being viewed by others
Data availability
The anonymized partial data that support the findings of this study are attached publicly with the trained models. Public datasets in training data for BioImLab (https://www.kaggle.com/datasets/arifmpthesis/bioimlab-chromosome-data-set-for-classification)18, Pki-3 (https://www.fim.uni-passau.de/en/research-and-professorships/former-chairs-professorships/mathematical-stochastics/chromosome-image-data)19, CIR-Net (https://github.com/CloudDataLab/CIR-Net/tree/master/data)20, ChromosomeNet (https://github.com/CloudDataLab/BenchmarkForChromosomeClassification)21, TVG_Hospital (https://www.cellimagelibrary.org/pages/auto_chromosome_detector)22, AutoKary2022 (https://github.com/wangjuncongyu/chromosome-instance-segmentation-dataset?tab=readme-ov-file)23, and CRCN-NE (https://zenodo.org/records/3229434)24 are publicly available from their original publications. The curated and anonymized subsets for numerical, stable, and unstable chromosomal abnormalities, as well as the associated code, are made available together with the trained models. Regarding the large-scale internal dataset used for pretraining: Due to patient privacy concerns and institutional ethics restrictions, this data cannot be made publicly available. However, access to the internal dataset may be granted to researchers upon reasonable request to the corresponding author, subject to the approval of the institutional review board and the signing of a Data Use Agreement (DUA). The authors declare that all other data supporting the findings of this study are available within the paper and its supplementary information files.
Code availability
The source code and the trained models for a working version of CHROMA is available at https://github.com/Changchun-Yang/CHROMA.
References
Pich, O. et al. The translational challenges of precision oncology. Cancer Cell 40, 458–478 (2022).
Watkins, T. B. K. et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature 587, 126–132 (2020).
Savage, J. R. Classification and relationships of induced chromosomal structural changes. J. Med. Genet. 13, 103–122 (1976).
Shamsi, Z. et al. Karyotype AI for precision oncology. Preprint at arXiv:2211.14312 (2022).
Mareschal, S. et al. Challenging conventional karyotyping by next-generation karyotyping in 281 intensively treated patients with AML. Blood Adv. 5, 1003–1016 (2021).
Zhang, N. et al. Global burden of hematologic malignancies and evolution patterns over the past 30 years. Blood Cancer J. 13, 82 (2023).
Awan, U. A. et al. Cytogenetic abnormalities in patients with hematological malignancies in Lahore city, Pakistan. Braz. J. Biol. 83, e249911.
Trejo et al. Chromosomal abnormalities in patients with haematologic malignancies in the General Hospital of Mexico. Rev. Médica del. Hospital Gen. de. México 80, 87–91 (2017).
He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15979–15988 2022.
Uzolas, L. et al. Deep anomaly generation: an image translation approach of synthesizing abnormal banded chromosome images. IEEE Access 10, 59090–59098 (2022).
Kaplan, J. et al. Scaling laws for neural language models. Preprint at arXiv:2001.08361 (2020).
Reis, D. et al. Real-time flying object detection with YOLOv8. Preprint at arXiv:2305.09972 (2023).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In 34th Conference on Neural Information Processing Systems 6840–6851 (Curran Associates, Inc., 2020).
Mishra, S. et al. A simple, efficient and scalable contrastive masked autoencoder for learning visual representations. Preprint at arXiv:2210.16870 (2022).
Dosovitskiy, A. et al. An image is worth 16 x 16 words: transformers for image recognition at scale. In International Conference on Learning Representations (ICLR, 2021).
Angelopoulos, A. et al. Conformal risk control. In The Twelfth International Conference on Learning Representations (ICLR, 2024).
Angelopoulos, A. N. et al. Prediction-powered inference. Science 382, 669–674 (2023).
Poletti, E. Grisan, E. & Ruggeri, A. Automatic classification of chromosomes in Q-band images. In 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. (IEEE, 2008).
Ritter, G. & Le, G. Automatic segmentation of metaphase cells based on global context and variant analysis. Pattern Recognit. 41, 38–55 (2008).
Lin, C. et al. Cir-net: automatic classification of human chromosome based on inception-resnet architecture. IEEE/ACM Trans. Comput. Biol. Bioinform. 19, 1285–1293 (2020).
Lin, C. et al. ChromosomeNet: a massive dataset enabling benchmarking and building basedlines of clinical chromosome classification. Comput. Biol. Chem. 100, 107731 (2022).
Tseng, J.-J. et al. An open dataset of annotated metaphase cell images for chromosome identification. Sci. Data 10, 104 (2023).
You, D. et al. AutoKary2022: a large-scale densely annotated dataset for chromosome instance segmentation. In 2023 IEEE International Conference on Multimedia and Expo (ICME). (IEEE, 2023).
Andrade, M. F. S. et al. A study of deep learning approaches for classification and detection chromosomes in metaphase images. Mach. Vis. Appl. 31, 65 (2020).
Author information
Authors and Affiliations
Contributions
C.Y., W.D., Y.Z., and X.G. conceived the study. C.Y., W.D., Y.Z., S.C., and J.H. designed the research. J.S., Y.C., A.X., and N.L. collected and curated the data. C.Y., W.D., and Y.Z. implemented the algorithms and performed data analysis. S.C., J.H., J.S., Y.C., and A.X. contributed to software development, experimental setup, and validation. X.G. and Y.Y. supervised the study. C.Y., W.D., and Y.Z. drafted the manuscript. S.C., J.H., J.S., Y.C., A.X., N.L., X.G., and Y.Y. critically revised the manuscript. All authors discussed the results and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, C., Dai, W., Zhang, Y. et al. A comprehensive foundation model for generalizable cytogenetics in precision oncology with CHROMA. npj Precis. Onc. (2026). https://doi.org/10.1038/s41698-026-01383-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41698-026-01383-4


