Abstract
Diabetic Retinopathy (DR), a leading cause of preventable blindness worldwide, underscores the urgent need for robust AI-driven diagnostic tools. Although various deep learning models for retinal imaging have emerged, their evaluation remains constrained by limited public available datasets that lack both large-scale coverage and fine-grained annotations, compromising reliable assessments of model generalizability. To bridge this gap, we introduce a comprehensive multimodal dataset that includes three key retinal imaging modalities: color fundus photography (CFP), optical coherence tomography (OCT), and ultrawide-field fundus imaging (UWF). Our dataset is unprecedented in scale and modality diversity, provides detailed lesion-level annotations and severity grades for DR and Diabetic Macular Edema (DME). We benchmark a range of fundus foundation models and large vision-language models on this dataset, revealing critical performance gaps and domain-specific challenges. By unifying large-scale multimodal data with precisely annotated clinical labels, our work establishes a foundational benchmark to drive advances in AI reliability and real-world clinical utility.
Similar content being viewed by others
Data availability
The MMRDR dataset in this Data Descriptor is publicly available on figshare (https://figshare.com/articles/dataset/MMRDR/29423747) and is cited as reference35. The dataset comprises three distinct subsets (color fundus photography, optical coherence tomography, and ultra-widefield fundus imaging) based on imaging modality. Each subset is partitioned into training and testing sets, with image data in JPEG format and corresponding CSV files providing annotations. The annotations include diabetic retinopathy severity grades, multi-label lesion classifications for CFP and UWF, diabetic macular edema grades for OCT, and laterality identifiers.
Code Availability
The code used to evaluate the foundation models in this paper is available at GitHub (https://github.com/Vladimirovich2019/MMRDR_Evaluation).
References
Bourne, R. et al. Causes of vision loss worldwide, 1990–2010: a systematic analysis. lancet glob health. 2013, 1 (6), e339–49 (2010).
Ting, D. S. W., Cheung, G. C. M. & Wong, T. Y. Diabetic retinopathy: global prevalence, major risk factors, screening practices and public health challenges: a review. Clinical & experimental ophthalmology 44, 260–277 (2016).
Wong, T. Y. et al. Guidelines on diabetic eye care: the international council of ophthalmology recommendations for screening, follow-up, referral, and treatment based on resource settings. Ophthalmology 125, 1608–1622 (2018).
Kim, E. J. et al. Treatment of diabetic macular edema. Current diabetes reports 19, 1–10 (2019).
Zheng, Y., He, M. & Congdon, N. The worldwide epidemic of diabetic retinopathy. Indian journal of ophthalmology 60, 428–431 (2012).
Burton, M. J. et al. The lancet global health commission on global eye health: vision beyond 2020. The Lancet Global Health 9, e489–e551 (2021).
Rajalakshmi, R., Prathiba, V., Arulmalar, S. & Usha, M. Review of retinal cameras for global coverage of diabetic retinopathy screening. Eye 35, 162–172 (2021).
Silva, P. S. et al. Peripheral lesions identified on ultrawide field imaging predict increased risk of diabetic retinopathy progression over 4 years. Ophthalmology 122, 949–956 (2015).
Price, L. D., Au, S. & Chong, N. V. Optomap ultrawide field imaging identifies additional retinal abnormalities in patients with diabetic retinopathy. Clinical Ophthalmology 527–531 (2015).
Kwan, C. C. & Fawzi, A. A. Imaging and biomarkers in diabetic macular edema and diabetic retinopathy. Current diabetes reports 19, 95 (2019).
Dugas, E., Jared, Jorge & Cukierski, W. Diabetic retinopathy detection. Kaggle competition https://kaggle.com/competitions/diabetic-retinopathy-detection Accessed: May 23, 2025 (2015).
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. jama 316, 2402–2410 (2016).
Li, T. et al. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Information Sciences 501, 511–522 (2019).
Gelman, R. & Fernandez-Granda, C. Analysis of transfer learning for select retinal disease classification. Retina 42, 174–183 (2022).
Fu, H. et al. Evaluation of retinal image quality assessment networks in different color-spaces. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22, 48–56 (Springer, 2019).
Pinto, I. et al. Improving diabetic retinopathy screening using artificial intelligence: design, evaluation and before-and-after study of a custom development. Frontiers in Digital Health 7, 1547045 (2025).
Porwal, P. et al. Indian diabetic retinopathy image dataset (idrid): a database for diabetic retinopathy screening research. Data 3, 25 (2018).
Liu, R. et al. Deepdrid: Diabetic retinopathy—grading and image quality estimation challenge. Patterns 3 (2022).
CodaLab Competitions. Ultra-widefield fundus imaging for diabetic retinopathy challenge 2024. https://codalab.lisn.upsaclay.fr/competitions/18605 Accessed: May 23, 2025 (2024).
He, S. et al. Open ultrawidefield fundus image dataset with disease diagnosis and clinical image quality assessment. Scientific Data 11, 1251 (2024).
Hu, Y., Wang, C., Song, W., Tiulpin, A. & Liu, Q. A scanning laser ophthalmoscopy image database and trustworthy retinal disease detection method. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 46–56 (Springer, 2024).
Yang, J., Shi, R. & Ni, B. Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), 191–195 (IEEE, 2021).
Subramanian, M., Shanmugavadivel, K., Naren, O. S., Premkumar, K. & Rankish, K. Classification of retinal oct images using deep learning. In 2022 international conference on computer communication and informatics (ICCCI), 1–7 (IEEE, 2022).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).
Dosovitskiy, A. et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021).
He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16000–16009 (2022).
Rui, S. et al. Multi-modal vision pre-training for medical image analysis. In Proceedings of the Computer Vision and Pattern Recognition Conference, 5164–5174 (2025).
Radford, A. et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763 (2021).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Silva-Rodriguez, J., Chakor, H., Kobbi, R., Dolz, J. & Ayed, I. B. A foundation language-image model of the retina (flair): Encoding expert knowledge in text supervision. Medical Image Analysis 99, 103357 (2025).
Wu, R. et al. Mm-retinal: Knowledge-enhanced foundational pretraining with fundus image-text expertise. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 722–732 (Springer, 2024).
Zhu, J. et al. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models. arXiv preprint arXiv:2504.10479 (2025).
Chen, J. et al. Huatuogpt-vision, towards injecting medical visual knowledge into multimodal llms at scale https://arxiv.org/abs/2406.19280 (2024).
Wilkinson, C. P. et al. Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales. Ophthalmology 110, 1677–1682 (2003).
Tang, Z. et al. MMRDR https://doi.org/10.6084/m9.figshare.29423747.v2, https://figshare.com/articles/dataset/MMRDR/29423747 (2025).
Hu, E. J. et al. Lora: Low-rank adaptation of large language models. ICLR 1, 3 (2022).
Wang, D. et al. A real-world dataset and benchmark for foundation model adaptation in medical image classification. Scientific Data 10, 574 (2023).
Acknowledgements
This project has received support from the Shandong Provincial Medical and Health Science and Technology Project (202407020141, 202507020749), the Qingdao Natural Science Foundation Original Exploration Project (25-1-1-245-zyyd-jch) and the Qingdao Medical and Health Excellent Young Medical Talent Project.
Author information
Authors and Affiliations
Contributions
Z.G., X.W., L.W. and J.L. conceptualized the study and compiled the dataset, while also developing the annotation protocols. Z.T. and L.C. carried out technical validation. Z.T. and L.W. drafted the majority of the manuscript. Q.L., S.X., C.F., L.R., and J.L. contributed to dataset curation and annotation. Z.G., X.W., and J.L. also provided critical scientific insights and contributed to the writing of the manuscript. All authors reviewed the final version and approved its submission.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Tang, Z., Wang, L., Guo, Z. et al. A multimodal retinal image dataset for diabetic retinopathy detection using foundation models. Sci Data (2026). https://doi.org/10.1038/s41597-026-07005-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-07005-9


