A multimodal retinal image dataset for diabetic retinopathy detection using foundation models

Tang, Zhenyu; Wang, Lilong; Guo, Zhen; Liang, Qianqian; Xue, Shuyue; Feng, Chengcheng; Ran, Lili; Chen, Lingzhi; Wang, Xiaosong; Li, Jun

doi:10.1038/s41597-026-07005-9

Download PDF

Data Descriptor
Open access
Published: 10 March 2026

A multimodal retinal image dataset for diabetic retinopathy detection using foundation models

Zhenyu Tang ORCID: orcid.org/0009-0008-2319-3067^1,2^na1,
Lilong Wang²^na1,
Zhen Guo^3,4^na1,
Qianqian Liang⁴,
Shuyue Xue⁴,
Chengcheng Feng⁴,
Lili Ran⁴,
Lingzhi Chen²,
Xiaosong Wang² &
…
Jun Li^3,4

Scientific Data , Article number: (2026) Cite this article

1982 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Diabetic Retinopathy (DR), a leading cause of preventable blindness worldwide, underscores the urgent need for robust AI-driven diagnostic tools. Although various deep learning models for retinal imaging have emerged, their evaluation remains constrained by limited public available datasets that lack both large-scale coverage and fine-grained annotations, compromising reliable assessments of model generalizability. To bridge this gap, we introduce a comprehensive multimodal dataset that includes three key retinal imaging modalities: color fundus photography (CFP), optical coherence tomography (OCT), and ultrawide-field fundus imaging (UWF). Our dataset is unprecedented in scale and modality diversity, provides detailed lesion-level annotations and severity grades for DR and Diabetic Macular Edema (DME). We benchmark a range of fundus foundation models and large vision-language models on this dataset, revealing critical performance gaps and domain-specific challenges. By unifying large-scale multimodal data with precisely annotated clinical labels, our work establishes a foundational benchmark to drive advances in AI reliability and real-world clinical utility.

Deep learning-based automated detection for diabetic retinopathy and diabetic macular oedema in retinal fundus photographs

Article 01 July 2021

Detection of signs of disease in external photographs of the eyes via deep learning

Article 29 March 2022

A fundus image dataset for intelligent diabetic retinopathy system

Article Open access 01 April 2026

Data availability

The MMRDR dataset in this Data Descriptor is publicly available on figshare (https://figshare.com/articles/dataset/MMRDR/29423747) and is cited as reference³⁵. The dataset comprises three distinct subsets (color fundus photography, optical coherence tomography, and ultra-widefield fundus imaging) based on imaging modality. Each subset is partitioned into training and testing sets, with image data in JPEG format and corresponding CSV files providing annotations. The annotations include diabetic retinopathy severity grades, multi-label lesion classifications for CFP and UWF, diabetic macular edema grades for OCT, and laterality identifiers.

Code Availability

The code used to evaluate the foundation models in this paper is available at GitHub (https://github.com/Vladimirovich2019/MMRDR_Evaluation).

References

Bourne, R. et al. Causes of vision loss worldwide, 1990–2010: a systematic analysis. lancet glob health. 2013, 1 (6), e339–49 (2010).
Ting, D. S. W., Cheung, G. C. M. & Wong, T. Y. Diabetic retinopathy: global prevalence, major risk factors, screening practices and public health challenges: a review. Clinical & experimental ophthalmology 44, 260–277 (2016).
Google Scholar
Wong, T. Y. et al. Guidelines on diabetic eye care: the international council of ophthalmology recommendations for screening, follow-up, referral, and treatment based on resource settings. Ophthalmology 125, 1608–1622 (2018).
Google Scholar
Kim, E. J. et al. Treatment of diabetic macular edema. Current diabetes reports 19, 1–10 (2019).
Google Scholar
Zheng, Y., He, M. & Congdon, N. The worldwide epidemic of diabetic retinopathy. Indian journal of ophthalmology 60, 428–431 (2012).
Google Scholar
Burton, M. J. et al. The lancet global health commission on global eye health: vision beyond 2020. The Lancet Global Health 9, e489–e551 (2021).
Google Scholar
Rajalakshmi, R., Prathiba, V., Arulmalar, S. & Usha, M. Review of retinal cameras for global coverage of diabetic retinopathy screening. Eye 35, 162–172 (2021).
Google Scholar
Silva, P. S. et al. Peripheral lesions identified on ultrawide field imaging predict increased risk of diabetic retinopathy progression over 4 years. Ophthalmology 122, 949–956 (2015).
Google Scholar
Price, L. D., Au, S. & Chong, N. V. Optomap ultrawide field imaging identifies additional retinal abnormalities in patients with diabetic retinopathy. Clinical Ophthalmology 527–531 (2015).
Kwan, C. C. & Fawzi, A. A. Imaging and biomarkers in diabetic macular edema and diabetic retinopathy. Current diabetes reports 19, 95 (2019).
Google Scholar
Dugas, E., Jared, Jorge & Cukierski, W. Diabetic retinopathy detection. Kaggle competition https://kaggle.com/competitions/diabetic-retinopathy-detection Accessed: May 23, 2025 (2015).
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. jama 316, 2402–2410 (2016).
Google Scholar
Li, T. et al. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Information Sciences 501, 511–522 (2019).
Google Scholar
Gelman, R. & Fernandez-Granda, C. Analysis of transfer learning for select retinal disease classification. Retina 42, 174–183 (2022).
Google Scholar
Fu, H. et al. Evaluation of retinal image quality assessment networks in different color-spaces. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22, 48–56 (Springer, 2019).
Pinto, I. et al. Improving diabetic retinopathy screening using artificial intelligence: design, evaluation and before-and-after study of a custom development. Frontiers in Digital Health 7, 1547045 (2025).
Google Scholar
Porwal, P. et al. Indian diabetic retinopathy image dataset (idrid): a database for diabetic retinopathy screening research. Data 3, 25 (2018).
Google Scholar
Liu, R. et al. Deepdrid: Diabetic retinopathy—grading and image quality estimation challenge. Patterns 3 (2022).
CodaLab Competitions. Ultra-widefield fundus imaging for diabetic retinopathy challenge 2024. https://codalab.lisn.upsaclay.fr/competitions/18605 Accessed: May 23, 2025 (2024).
He, S. et al. Open ultrawidefield fundus image dataset with disease diagnosis and clinical image quality assessment. Scientific Data 11, 1251 (2024).
Google Scholar
Hu, Y., Wang, C., Song, W., Tiulpin, A. & Liu, Q. A scanning laser ophthalmoscopy image database and trustworthy retinal disease detection method. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 46–56 (Springer, 2024).
Yang, J., Shi, R. & Ni, B. Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), 191–195 (IEEE, 2021).
Subramanian, M., Shanmugavadivel, K., Naren, O. S., Premkumar, K. & Rankish, K. Classification of retinal oct images using deep learning. In 2022 international conference on computer communication and informatics (ICCCI), 1–7 (IEEE, 2022).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).
Dosovitskiy, A. et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021).
He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16000–16009 (2022).
Rui, S. et al. Multi-modal vision pre-training for medical image analysis. In Proceedings of the Computer Vision and Pattern Recognition Conference, 5164–5174 (2025).
Radford, A. et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763 (2021).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Google Scholar
Silva-Rodriguez, J., Chakor, H., Kobbi, R., Dolz, J. & Ayed, I. B. A foundation language-image model of the retina (flair): Encoding expert knowledge in text supervision. Medical Image Analysis 99, 103357 (2025).
Google Scholar
Wu, R. et al. Mm-retinal: Knowledge-enhanced foundational pretraining with fundus image-text expertise. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 722–732 (Springer, 2024).
Zhu, J. et al. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models. arXiv preprint arXiv:2504.10479 (2025).
Chen, J. et al. Huatuogpt-vision, towards injecting medical visual knowledge into multimodal llms at scale https://arxiv.org/abs/2406.19280 (2024).
Wilkinson, C. P. et al. Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales. Ophthalmology 110, 1677–1682 (2003).
Google Scholar
Tang, Z. et al. MMRDR https://doi.org/10.6084/m9.figshare.29423747.v2, https://figshare.com/articles/dataset/MMRDR/29423747 (2025).
Hu, E. J. et al. Lora: Low-rank adaptation of large language models. ICLR 1, 3 (2022).
Google Scholar
Wang, D. et al. A real-world dataset and benchmark for foundation model adaptation in medical image classification. Scientific Data 10, 574 (2023).
Google Scholar

Download references

Acknowledgements

This project has received support from the Shandong Provincial Medical and Health Science and Technology Project (202407020141, 202507020749), the Qingdao Natural Science Foundation Original Exploration Project (25-1-1-245-zyyd-jch) and the Qingdao Medical and Health Excellent Young Medical Talent Project.

Author information

These authors contributed equally: Zhenyu Tang, Lilong Wang, Zhen Guo.

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Zhenyu Tang
Shanghai AI Laboratory, Shanghai, China
Zhenyu Tang, Lilong Wang, Lingzhi Chen & Xiaosong Wang
Eye Institute of Shandong First Medical University, Shandong, China
Zhen Guo & Jun Li
Qingdao Eye Hospital of Shandong First Medical University, Shandong, China
Zhen Guo, Qianqian Liang, Shuyue Xue, Chengcheng Feng, Lili Ran & Jun Li

Authors

Zhenyu Tang
View author publications
Search author on:PubMed Google Scholar
Lilong Wang
View author publications
Search author on:PubMed Google Scholar
Zhen Guo
View author publications
Search author on:PubMed Google Scholar
Qianqian Liang
View author publications
Search author on:PubMed Google Scholar
Shuyue Xue
View author publications
Search author on:PubMed Google Scholar
Chengcheng Feng
View author publications
Search author on:PubMed Google Scholar
Lili Ran
View author publications
Search author on:PubMed Google Scholar
Lingzhi Chen
View author publications
Search author on:PubMed Google Scholar
Xiaosong Wang
View author publications
Search author on:PubMed Google Scholar
Jun Li
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.G., X.W., L.W. and J.L. conceptualized the study and compiled the dataset, while also developing the annotation protocols. Z.T. and L.C. carried out technical validation. Z.T. and L.W. drafted the majority of the manuscript. Q.L., S.X., C.F., L.R., and J.L. contributed to dataset curation and annotation. Z.G., X.W., and J.L. also provided critical scientific insights and contributed to the writing of the manuscript. All authors reviewed the final version and approved its submission.

Corresponding authors

Correspondence to Xiaosong Wang or Jun Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Tang, Z., Wang, L., Guo, Z. et al. A multimodal retinal image dataset for diabetic retinopathy detection using foundation models. Sci Data (2026). https://doi.org/10.1038/s41597-026-07005-9

Download citation

Received: 21 July 2025
Accepted: 27 February 2026
Published: 10 March 2026
DOI: https://doi.org/10.1038/s41597-026-07005-9