Abstract
Gastric adenocarcinoma is a leading cause of cancer related mortality worldwide, and histopathologic examination of endoscopic biopsy samples remains essential for its diagnosis and grading. In this study, we propose a novel AI based caption generation model, termed MIAC (Multi-instance Attention Captioning), designed to produce descriptive diagnostic reports from digital pathology images. The model leverages a Multi-instance learning framework with permutation-invariant self attention to aggregate features from multiple histopathology image patches into a unified representation, effectively capturing whole slide characteristics. Using the publicly available PatchGastricADC22 dataset for training and validation, and an External Test dataset from Gil Hospital of Gachon University for clinical testing, the model demonstrated strong performance across standard natural language generation metrics (BLEU@4, ROUGE-L, METEOR, CIDEr). Notably, MIAC maintained high captioning accuracy even when evaluated on previously unseen data, particularly after color normalization using the Macenko method. These results underscore the model’s robustness, generalizability, and potential for integration into routine digital pathology workflows to assist pathologists in generating structured diagnostic reports.
Similar content being viewed by others
Data availability
The publicly available PatchGastricADC22 dataset used in this study can be accessed at: https://www.kaggle.com/datasets/sanikapadegaonkar/patchgastricadc22 The code used for model training and evaluation is available at: https://github.com/Leeyoungsup/histopathology_captioning The clinical test dataset collected from Gachon University Gil Medical Center is not publicly available due to patient privacy and institutional data protection policies. However, data may be available from the corresponding author upon reasonable request and with appropriate institutional approvals.
References
Sung, H. et al. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. Clin.71, 209–249 (2021).
Pasechnikov, V., Chukov, S., Fedorov, E., Kikuste, I. & Leja, M. Gastric cancer: prevention, screening and early diagnosis. World J. Gastroenterol.20, 13842–13862 (2014).
Ajani, J. A. et al. Gastric cancer, version 2.2022, nccn clinical practice guidelines in oncology. J. Natl. Compr. Cancer Netw.20, 167–192 (2022).
Hirasawa, T. et al. Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer21, 653–660 (2018).
Luo, H. et al. Real-time artificial intelligence for detection of upper gastrointestinal cancer by endoscopy: a multicentre, case-control, diagnostic study. Lancet Oncol.20, 1645–1654 (2019).
Zhu, Y. et al. Application of convolutional neural network in the diagnosis of the invasion depth of gastric cancer based on conventional endoscopy. Gastrointest. Endosc.89, 806–815 (2019).
Jiang, Y. et al. Radiomics signature on computed tomography imaging: association with lymph node metastasis in patients with gastric cancer. Front. Oncol.9, 340 (2019).
Wang, Y. et al. Prediction of the depth of tumor invasion in gastric cancer: Potential role of ct radiomics. Acad. Radiol.27, 1077–1084 (2020).
Giganti, F. et al. Pre-treatment mdct-based texture analysis for therapy response prediction in gastric cancer: Comparison with tumour regression grade at final histology. Eur. J. Radiol.90, 129–137 (2017).
Jiang, Y. et al. Noninvasive imaging evaluation of tumor immune microenvironment to predict outcomes in gastric cancer. Ann. Oncol.31, 760–768 (2020).
Zhang, W. et al. Development and validation of a ct-based radiomic nomogram for preoperative prediction of early recurrence in advanced gastric cancer. Radiother. Oncol.145, 13–20 (2020).
Niazi, M. K. K., Parwani, A. V. & Gurcan, M. N. Digital pathology and artificial intelligence. Lancet Oncol.20, e253–e261 (2019).
Ferreira, R. et al. The virtual microscope. Proc AMIA Annu Fall Symp, 449–453 (1997).
Mukhopadhyay, S. et al. Whole slide imaging versus microscopy for primary diagnosis in surgical pathology: A multicenter blinded randomized noninferiority study of 1992 cases (pivotal study). Am. J. Surg. Pathol.42, 39–52 (2018).
Abels, E. et al. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the digital pathology association. J. Pathol.249, 286–294 (2019).
Pantanowitz, L. et al. Validating whole slide imaging for diagnostic purposes in pathology: Guideline from the college of American pathologists pathology and laboratory quality center. Archives of Pathology. Lab. Med.137, 1710–1722 (2013).
Zarella, M. D. et al. A practical guide to whole slide imaging: a white paper from the digital pathology association. Arch. Pathol. Lab. Med.143, 222–234 (2019).
Chen, S. et al. Applications of artificial intelligence in digital pathology for gastric cancer. Front. Oncol.14, 1437252 (2024).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems25 (2012).
Srinidhi, C. L., Ciga, O. & Martel, A. L. Deep neural network models for computational histopathology: A survey. Med. Image Anal.67, 101813 (2021).
Liang, Q. et al. Weakly supervised biomedical image segmentation by reiterative learning. IEEE J. Biomed. Health Inform.23, 1205–1214 (2019).
Qu, J. et al. Gastric pathology image classification using stepwise fine-tuning for deep neural networks. J. Healthc. Eng.2018, 8961781 (2018).
Abe, H. et al. Development and multi-institutional validation of an artificial intelligence-based diagnostic system for gastric biopsy. Cancer Sci.113, 3608–3617 (2022).
Sharma, H. et al. Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology. Comput. Med. Imaging Graph.61, 2–13 (2017).
Iizuka, O. et al. Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Sci. Rep.10, 1504 (2020).
Park, J. et al. A prospective validation and observer performance study of a deep learning algorithm for pathologic diagnosis of gastric tumors in endoscopic biopsies. Clin. Cancer Res.27, 719–728 (2021).
Yoshida, H. et al. Automated histological classification of whole-slide images of gastric biopsy specimens. Gastr Cancer21, 249–257 (2018).
Ba, W. et al. Assessment of deep learning assistance for the pathological diagnosis of gastric cancer. Mod. Pathol.35, 1262–1268 (2022).
Lan, J. et al. Using less annotation workload to establish a pathological auxiliary diagnosis system for gastric cancer. Cell Rep. Med.4, 101004 (2023).
Tung, C.-L. et al. Identifying pathological slices of gastric cancer via deep learning. J. Formosan Med. Assoc.121, 2457–2464 (2022).
Marletta, S., Treanor, D., Eccher, A. & Pantanowitz, L. Whole-slide imaging in cytopathology: state of the art and future directions. Diagn. Histopathol.27, 425–430 (2021).
Marletta, S. et al. Application of digital imaging and artificial intelligence to pathology of the placenta. Pediatr. Dev. Pathol.26, 5–12 (2023).
Fu, B. et al. Stohisnet: A hybrid multi-classification model with cnn and transformer for gastric pathology images. Comput. Methods Programs Biomed.221, 106924 (2022).
Kanavati, F. & Tsuneki, M. A deep learning model for gastric diffuse-type adenocarcinoma classification in whole slide images. Sci. Rep.11, 20486 (2021).
Tsuneki, M. & Kanavati, F. Inference of captions from histopathological patches. Proc. Mach. Learn. Res.1235, 1235–1250 (2022).
Qin, W. et al. What a whole slide image can tell? subtype-guided masked transformer for pathological image captioning. arXiv:2310.20607 (2023).
Tsuneki, M. & Kanavati, F. Inference of captions from histopathological patches. In Konukoglu, E. et al. (eds.) Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, vol. 172 of Proceedings of Machine Learning Research, 1235–1250 (PMLR, 2022).
Cong, G. & Fung, V. Improving materials property predictions for graph neural networks with minimal feature engineering. Mach. Learn. Sci. Technol.4, 035030 (2023).
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. In 2009 IEEE international symposium on biomedical imaging: from nano to macro, 1107–1110 (IEEE, 2009).
Lee, H., Lee, K., Lee, K., Lee, H. & Shin, J. Improving transferability of representations via augmentation-aware self-supervision. Adv. Neural Inf. Process. Syst.34, 17710–17722 (2021).
Griffis, D., Shivade, C., Fosler-Lussier, E. & Lai, A. M. A quantitative and qualitative evaluation of sentence boundary detection for the clinical domain. AMIA Summits Transl. Sci. Proc.2016, 88 (2016).
Liang, M. et al. Caf-ahgcn: context-aware attention fusion adaptive hypergraph convolutional network for human-interpretable prediction of gigapixel whole-slide image. Vis. Comput.40(12), 8747–65 (2024).
Cai, C. et al. Pathologist-level diagnosis of ulcerative colitis inflammatory activity level using an automated histological grading method. Int. J. Med. Inform.192, 105648 (2024).
Wang, Z. & Nirjon, S. Characterizing disparity between edge models and high-accuracy base models for vision tasks. arXiv:2407.10016 (2024).
Seo, M., Lee, H.-J. & Nguyen, X. T. Vit-p3de\(\ast\): Vision transformer based multi-camera instance. In IJCAI, 1340–1350 (2023).
Fu, L. et al. Rethinking patch dependence for masked autoencoders. arXiv:2401.14391 (2024).
Guo, Z. et al. Histgen: Histopathology report generation via local-global feature encoding and cross-modal context interaction. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 189–199 (Springer, 2024).
Das, K., Conjeti, S., Chatterjee, J. & Sheet, D. Detection of breast cancer from whole slide histopathological images using deep multiple instance cnn. IEEE Access8, 213502–213511 (2020).
He, T., Zhang, J., Zhou, Z. & Glass, J. Quantifying exposure bias for neural language generation (2019).
Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311–318 (2002).
Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74–81 (2004).
Banerjee, S. & Lavie, A. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 65–72 (2005).
Vedantam, R., Lawrence Zitnick, C. & Parikh, D. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4566–4575 (2015).
Japanese Gastric Cancer Association jgca@ koto. kpu-m. ac. jp. Japanese gastric cancer treatment guidelines 2018 Japanese gastric cancer association. Gastr. Cancer24, 1–21 (2021).
Google Cloud. Evaluating models – understanding the bleu score – interpretation. (accessed 06 Feb 2022) https://cloud.google.com/translate/automl/docs/evaluate#interpretation (2022).
WHO Classification of Tumours Editorial Board. Digestive system tumours, 5th edn (International Agency for Research on Cancer, Lyon, 2019).
Lauren, P. The two histological main types of gastric carcinoma: diffuse and so-called intestinal-type carcinoma: an attempt at a histo-clinical classification. Acta Pathol. Microbiol. Scand.64, 31–49 (1965).
Japanese Gastric Cancer Association jgca@ koto. kpu-m. ac. jp. Japanese classification of gastric carcinoma: 3rd English edition. Gastr. Cancer14, 101–112 (2011).
Funding
This work was supported by the Digital Medical Products Development Based on Medical Data Synthesis and AI Technologies Program(RS-2025-02305698, Development of On-Device AI Digital Medical Products Utilizing Synthetic Technology and Synthetic Data for Atypical Medical Data) funded by the Ministry of Trade, Industry&Energy(MOTIE) of Korea. This work was supported by the Gachon University research fund of 2024 (GCU-2024-202410530001).
Author information
Authors and Affiliations
Contributions
Youngseop Lee conceptualized the study, developed the methodology, conducted the experiments, and analyzed the results. Kyungah Bai contributed to data collection, validation of experimental results, and manuscript editing. Youngjae Kim provided technical support for model implementation and contributed to the preprocessing pipeline. Jisup Kim and Kwanggi Kim supervised the study, served as corresponding authors, and provided pathological and experimental guidance. All authors reviewed and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
This study was approved by the Institutional Review Board (IRB) of Gil Hospital of Gachon University (Approval Number: GBIRB2024-121). All experimental protocols were conducted in accordance with relevant guidelines and regulations, strictly adhering to the ethical principles outlined in the Declaration of Helsinki.
Informed consent
The requirement for informed consent was waived due to the retrospective nature of the study design.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lee, Y., Bai, K., Kim, Y. et al. AI caption generation model for digital pathology of adenocarcinoma in endoscopic histopathology using multi-instance attention mechanisms. Sci Rep (2026). https://doi.org/10.1038/s41598-026-37455-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-37455-5


