AI caption generation model for digital pathology of adenocarcinoma in endoscopic histopathology using multi-instance attention mechanisms

Lee, Youngseop; Bai, Kyungah; Kim, Young Jae; Kim, Jisup; Kim, Kwang Gi

doi:10.1038/s41598-026-37455-5

Download PDF

Article
Open access
Published: 12 March 2026

AI caption generation model for digital pathology of adenocarcinoma in endoscopic histopathology using multi-instance attention mechanisms

Scientific Reports , Article number: (2026) Cite this article

742 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Gastric adenocarcinoma is a leading cause of cancer related mortality worldwide, and histopathologic examination of endoscopic biopsy samples remains essential for its diagnosis and grading. In this study, we propose a novel AI based caption generation model, termed MIAC (Multi-instance Attention Captioning), designed to produce descriptive diagnostic reports from digital pathology images. The model leverages a Multi-instance learning framework with permutation-invariant self attention to aggregate features from multiple histopathology image patches into a unified representation, effectively capturing whole slide characteristics. Using the publicly available PatchGastricADC22 dataset for training and validation, and an External Test dataset from Gil Hospital of Gachon University for clinical testing, the model demonstrated strong performance across standard natural language generation metrics (BLEU@4, ROUGE-L, METEOR, CIDEr). Notably, MIAC maintained high captioning accuracy even when evaluated on previously unseen data, particularly after color normalization using the Macenko method. These results underscore the model’s robustness, generalizability, and potential for integration into routine digital pathology workflows to assist pathologists in generating structured diagnostic reports.

Machine intelligence in non-invasive endocrine cancer diagnostics

Article 09 November 2021

Enhanced gastric cancer classification and quantification interpretable framework using digital histopathology images

Article Open access 28 September 2024

Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology

Article Open access 06 June 2025

Data availability

The publicly available PatchGastricADC22 dataset used in this study can be accessed at: https://www.kaggle.com/datasets/sanikapadegaonkar/patchgastricadc22 The code used for model training and evaluation is available at: https://github.com/Leeyoungsup/histopathology_captioning The clinical test dataset collected from Gachon University Gil Medical Center is not publicly available due to patient privacy and institutional data protection policies. However, data may be available from the corresponding author upon reasonable request and with appropriate institutional approvals.

References

Sung, H. et al. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. Clin.71, 209–249 (2021).
Google Scholar
Pasechnikov, V., Chukov, S., Fedorov, E., Kikuste, I. & Leja, M. Gastric cancer: prevention, screening and early diagnosis. World J. Gastroenterol.20, 13842–13862 (2014).
Google Scholar
Ajani, J. A. et al. Gastric cancer, version 2.2022, nccn clinical practice guidelines in oncology. J. Natl. Compr. Cancer Netw.20, 167–192 (2022).
Google Scholar
Hirasawa, T. et al. Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer21, 653–660 (2018).
Google Scholar
Luo, H. et al. Real-time artificial intelligence for detection of upper gastrointestinal cancer by endoscopy: a multicentre, case-control, diagnostic study. Lancet Oncol.20, 1645–1654 (2019).
Google Scholar
Zhu, Y. et al. Application of convolutional neural network in the diagnosis of the invasion depth of gastric cancer based on conventional endoscopy. Gastrointest. Endosc.89, 806–815 (2019).
Google Scholar
Jiang, Y. et al. Radiomics signature on computed tomography imaging: association with lymph node metastasis in patients with gastric cancer. Front. Oncol.9, 340 (2019).
Google Scholar
Wang, Y. et al. Prediction of the depth of tumor invasion in gastric cancer: Potential role of ct radiomics. Acad. Radiol.27, 1077–1084 (2020).
Google Scholar
Giganti, F. et al. Pre-treatment mdct-based texture analysis for therapy response prediction in gastric cancer: Comparison with tumour regression grade at final histology. Eur. J. Radiol.90, 129–137 (2017).
Google Scholar
Jiang, Y. et al. Noninvasive imaging evaluation of tumor immune microenvironment to predict outcomes in gastric cancer. Ann. Oncol.31, 760–768 (2020).
Google Scholar
Zhang, W. et al. Development and validation of a ct-based radiomic nomogram for preoperative prediction of early recurrence in advanced gastric cancer. Radiother. Oncol.145, 13–20 (2020).
Google Scholar
Niazi, M. K. K., Parwani, A. V. & Gurcan, M. N. Digital pathology and artificial intelligence. Lancet Oncol.20, e253–e261 (2019).
Google Scholar
Ferreira, R. et al. The virtual microscope. Proc AMIA Annu Fall Symp, 449–453 (1997).
Mukhopadhyay, S. et al. Whole slide imaging versus microscopy for primary diagnosis in surgical pathology: A multicenter blinded randomized noninferiority study of 1992 cases (pivotal study). Am. J. Surg. Pathol.42, 39–52 (2018).
Google Scholar
Abels, E. et al. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the digital pathology association. J. Pathol.249, 286–294 (2019).
Google Scholar
Pantanowitz, L. et al. Validating whole slide imaging for diagnostic purposes in pathology: Guideline from the college of American pathologists pathology and laboratory quality center. Archives of Pathology. Lab. Med.137, 1710–1722 (2013).
Google Scholar
Zarella, M. D. et al. A practical guide to whole slide imaging: a white paper from the digital pathology association. Arch. Pathol. Lab. Med.143, 222–234 (2019).
Google Scholar
Chen, S. et al. Applications of artificial intelligence in digital pathology for gastric cancer. Front. Oncol.14, 1437252 (2024).
Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems25 (2012).
Srinidhi, C. L., Ciga, O. & Martel, A. L. Deep neural network models for computational histopathology: A survey. Med. Image Anal.67, 101813 (2021).
Google Scholar
Liang, Q. et al. Weakly supervised biomedical image segmentation by reiterative learning. IEEE J. Biomed. Health Inform.23, 1205–1214 (2019).
Google Scholar
Qu, J. et al. Gastric pathology image classification using stepwise fine-tuning for deep neural networks. J. Healthc. Eng.2018, 8961781 (2018).
Google Scholar
Abe, H. et al. Development and multi-institutional validation of an artificial intelligence-based diagnostic system for gastric biopsy. Cancer Sci.113, 3608–3617 (2022).
Google Scholar
Sharma, H. et al. Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology. Comput. Med. Imaging Graph.61, 2–13 (2017).
Google Scholar
Iizuka, O. et al. Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Sci. Rep.10, 1504 (2020).
Google Scholar
Park, J. et al. A prospective validation and observer performance study of a deep learning algorithm for pathologic diagnosis of gastric tumors in endoscopic biopsies. Clin. Cancer Res.27, 719–728 (2021).
Google Scholar
Yoshida, H. et al. Automated histological classification of whole-slide images of gastric biopsy specimens. Gastr Cancer21, 249–257 (2018).
Google Scholar
Ba, W. et al. Assessment of deep learning assistance for the pathological diagnosis of gastric cancer. Mod. Pathol.35, 1262–1268 (2022).
Google Scholar
Lan, J. et al. Using less annotation workload to establish a pathological auxiliary diagnosis system for gastric cancer. Cell Rep. Med.4, 101004 (2023).
Google Scholar
Tung, C.-L. et al. Identifying pathological slices of gastric cancer via deep learning. J. Formosan Med. Assoc.121, 2457–2464 (2022).
Google Scholar
Marletta, S., Treanor, D., Eccher, A. & Pantanowitz, L. Whole-slide imaging in cytopathology: state of the art and future directions. Diagn. Histopathol.27, 425–430 (2021).
Google Scholar
Marletta, S. et al. Application of digital imaging and artificial intelligence to pathology of the placenta. Pediatr. Dev. Pathol.26, 5–12 (2023).
Google Scholar
Fu, B. et al. Stohisnet: A hybrid multi-classification model with cnn and transformer for gastric pathology images. Comput. Methods Programs Biomed.221, 106924 (2022).
Google Scholar
Kanavati, F. & Tsuneki, M. A deep learning model for gastric diffuse-type adenocarcinoma classification in whole slide images. Sci. Rep.11, 20486 (2021).
Google Scholar
Tsuneki, M. & Kanavati, F. Inference of captions from histopathological patches. Proc. Mach. Learn. Res.1235, 1235–1250 (2022).
Google Scholar
Qin, W. et al. What a whole slide image can tell? subtype-guided masked transformer for pathological image captioning. arXiv:2310.20607 (2023).
Tsuneki, M. & Kanavati, F. Inference of captions from histopathological patches. In Konukoglu, E. et al. (eds.) Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, vol. 172 of Proceedings of Machine Learning Research, 1235–1250 (PMLR, 2022).
Cong, G. & Fung, V. Improving materials property predictions for graph neural networks with minimal feature engineering. Mach. Learn. Sci. Technol.4, 035030 (2023).
Google Scholar
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. In 2009 IEEE international symposium on biomedical imaging: from nano to macro, 1107–1110 (IEEE, 2009).
Lee, H., Lee, K., Lee, K., Lee, H. & Shin, J. Improving transferability of representations via augmentation-aware self-supervision. Adv. Neural Inf. Process. Syst.34, 17710–17722 (2021).
Google Scholar
Griffis, D., Shivade, C., Fosler-Lussier, E. & Lai, A. M. A quantitative and qualitative evaluation of sentence boundary detection for the clinical domain. AMIA Summits Transl. Sci. Proc.2016, 88 (2016).
Google Scholar
Liang, M. et al. Caf-ahgcn: context-aware attention fusion adaptive hypergraph convolutional network for human-interpretable prediction of gigapixel whole-slide image. Vis. Comput.40(12), 8747–65 (2024).
Google Scholar
Cai, C. et al. Pathologist-level diagnosis of ulcerative colitis inflammatory activity level using an automated histological grading method. Int. J. Med. Inform.192, 105648 (2024).
Google Scholar
Wang, Z. & Nirjon, S. Characterizing disparity between edge models and high-accuracy base models for vision tasks. arXiv:2407.10016 (2024).
Seo, M., Lee, H.-J. & Nguyen, X. T. Vit-p3de\(\ast\): Vision transformer based multi-camera instance. In IJCAI, 1340–1350 (2023).
Fu, L. et al. Rethinking patch dependence for masked autoencoders. arXiv:2401.14391 (2024).
Guo, Z. et al. Histgen: Histopathology report generation via local-global feature encoding and cross-modal context interaction. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 189–199 (Springer, 2024).
Das, K., Conjeti, S., Chatterjee, J. & Sheet, D. Detection of breast cancer from whole slide histopathological images using deep multiple instance cnn. IEEE Access8, 213502–213511 (2020).
Google Scholar
He, T., Zhang, J., Zhou, Z. & Glass, J. Quantifying exposure bias for neural language generation (2019).
Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311–318 (2002).
Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74–81 (2004).
Banerjee, S. & Lavie, A. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 65–72 (2005).
Vedantam, R., Lawrence Zitnick, C. & Parikh, D. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4566–4575 (2015).
Japanese Gastric Cancer Association jgca@ koto. kpu-m. ac. jp. Japanese gastric cancer treatment guidelines 2018 Japanese gastric cancer association. Gastr. Cancer24, 1–21 (2021).
Google Scholar
Google Cloud. Evaluating models – understanding the bleu score – interpretation. (accessed 06 Feb 2022) https://cloud.google.com/translate/automl/docs/evaluate#interpretation (2022).
WHO Classification of Tumours Editorial Board. Digestive system tumours, 5th edn (International Agency for Research on Cancer, Lyon, 2019).
Lauren, P. The two histological main types of gastric carcinoma: diffuse and so-called intestinal-type carcinoma: an attempt at a histo-clinical classification. Acta Pathol. Microbiol. Scand.64, 31–49 (1965).
Google Scholar
Japanese Gastric Cancer Association jgca@ koto. kpu-m. ac. jp. Japanese classification of gastric carcinoma: 3rd English edition. Gastr. Cancer14, 101–112 (2011).

Download references

Funding

This work was supported by the Digital Medical Products Development Based on Medical Data Synthesis and AI Technologies Program(RS-2025-02305698, Development of On-Device AI Digital Medical Products Utilizing Synthetic Technology and Synthetic Data for Atypical Medical Data) funded by the Ministry of Trade, Industry&Energy(MOTIE) of Korea. This work was supported by the Gachon University research fund of 2024 (GCU-2024-202410530001).

Author information

Authors and Affiliations

Medical Devices R&D Center, Gachon University Gil Medical Center, Incheon, 21565, South Korea
Youngseop Lee, Young Jae Kim & Kwang Gi Kim
Department of Pathology, Graduate School of Medicine, College of Medicine, Seoul National University, Seoul, 03080, South Korea
Kyungah Bai
Gachon Biomedical & Convergence Institute, Gachon University Gil Medical Center, Incheon, 21936, South Korea
Young Jae Kim
Department of Pathology, Gil Medical Center, Gachon University College of Medicine, Incheon, 21565, South Korea
Jisup Kim
Department of Biomedical Engineering, College of IT Convergence, Seongnam, Seongnam, 13120, South Korea
Kwang Gi Kim

Authors

Youngseop Lee
View author publications
Search author on:PubMed Google Scholar
Kyungah Bai
View author publications
Search author on:PubMed Google Scholar
Young Jae Kim
View author publications
Search author on:PubMed Google Scholar
Jisup Kim
View author publications
Search author on:PubMed Google Scholar
Kwang Gi Kim
View author publications
Search author on:PubMed Google Scholar

Contributions

Youngseop Lee conceptualized the study, developed the methodology, conducted the experiments, and analyzed the results. Kyungah Bai contributed to data collection, validation of experimental results, and manuscript editing. Youngjae Kim provided technical support for model implementation and contributed to the preprocessing pipeline. Jisup Kim and Kwanggi Kim supervised the study, served as corresponding authors, and provided pathological and experimental guidance. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Jisup Kim or Kwang Gi Kim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This study was approved by the Institutional Review Board (IRB) of Gil Hospital of Gachon University (Approval Number: GBIRB2024-121). All experimental protocols were conducted in accordance with relevant guidelines and regulations, strictly adhering to the ethical principles outlined in the Declaration of Helsinki.

Informed consent

The requirement for informed consent was waived due to the retrospective nature of the study design.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Lee, Y., Bai, K., Kim, Y. et al. AI caption generation model for digital pathology of adenocarcinoma in endoscopic histopathology using multi-instance attention mechanisms. Sci Rep (2026). https://doi.org/10.1038/s41598-026-37455-5

Download citation

Received: 30 July 2025
Accepted: 22 January 2026
Published: 12 March 2026
DOI: https://doi.org/10.1038/s41598-026-37455-5