Graphicalized vision-language modeling for comprehensive lung nodule analysis and risk stratification

Zhao, Danwen; Xi, Junfeng; Guo, Xun; Chai, Jintao; Xu, Zhengshui; Li, Liang; Xue, Yan; Sun, Qingyu; Zheng, Yinggang; Liu, Shiyuan

doi:10.1038/s41746-026-02602-9

Download PDF

Article
Open access
Published: 11 April 2026

Graphicalized vision-language modeling for comprehensive lung nodule analysis and risk stratification

Danwen Zhao¹,
Junfeng Xi²,
Xun Guo¹,
Jintao Chai¹,
Zhengshui Xu¹,
Liang Li¹,
Yan Xue¹,
Qingyu Sun³,
Yinggang Zheng⁴ &
…
Shiyuan Liu^1,5

npj Digital Medicine , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Lung cancer care involves coupled tasks such as precise nodule detection, patient-level survival risk estimation, and nodule count quantification, typically handled by separate systems despite clear interdependence. We present VITALIS, a multimodal vision-language framework that fuses CT and PET/CT imaging with structured radiology text using a graph-aware Transformer: Laplacian diffusion enriches token features on an image-text graph, while structural and prior-guided attention focus computation on anatomically and clinically related contexts, followed by bidirectional image-text conditioning to form a fused patient representation. This representation parameterizes a continuous-time latent risk process governed by a context-modulated Neural ODE, enabling individualized continuous-time modeling of time-to-event risk. Task-specific heads decode the latent trajectory into nodule detection, nodule malignancy classification, survival risk estimation, and nodule count prediction. Evaluated on three public cohorts, the framework delivers accurate delineations, low-false-positive localization, calibrated survival risk estimates, and consistent nodule counts across tasks. These findings indicate that coupling graph-aware multimodal encoding with continuous-time latent dynamics provides a coherent basis for integrated diagnostic and prognostic modeling in lung cancer.

AI driven hybrid convolutional and transformer based deep learning architecture for precise lung nodule classification

Article Open access 07 January 2026

Data-driven risk stratification and precision management of pulmonary nodules detected on chest computed tomography

Article Open access 17 September 2024

An ensemble deep learning model for risk stratification of invasive lung adenocarcinoma using thin-slice CT

Article Open access 05 July 2023

Data availability

All datasets used in this study are publicly accessible: LIDC-IDRI: https://www.cancerimagingarchive.net/collection/lidc-idri/; LUNG-PET-CT-DX: https://www.cancerimagingarchive.net/collection/lung-pet-ct-dx/; ACRIN-NSCLC-FDG-PET (ACRIN 6668): https://www.cancerimagingarchive.net/collection/acrin-nsclc-fdg-pet/; NLST New-lesion LongCT: https://www.cancerimagingarchive.net/analysis-result/nlst-new-lesion-longct/.

Code availability

The source code supporting the findings of this study, including the VITALIS model architecture, training protocols, and inference scripts, is available for review at https://anonymous.4open.science/r/VITALIS-5E6A. The repository contains the complete implementation of the graph-aware multimodal fusion, neural ODE dynamics, and the multi-task learning framework described in the Methods section.

References

Zhou, L., Wu, C., Chen, Y. & Zhang, Z. Multitask connected U-Net: automatic lung cancer segmentation from CT images using pet knowledge guidance. Front. Artif. Intell. 7, 1423535 (2024).
Google Scholar
Elkefi, S. et al. Systematic review on the technology’s role in supporting lung cancer patients in the treatment journey. npj Digit. Med. 8, 516 (2025).
Google Scholar
Niu, C. et al. Medical multimodal multitask foundation model for lung cancer screening. Nat. Commun. 16, 1523 (2025).
Google Scholar
Cai, G. et al. MSDet: receptive field enhanced multiscale detection for tiny pulmonary nodule. Preprint at arXiv https://doi.org/10.48550/arXiv.2409.14028 (2024).
Tang, C., Zhou, F., Sun, J. & Zhang, Y. Circle-YOLO: an anchor-free lung nodule detection algorithm using bounding circle representation. Pattern Recognit. 161, 111294 (2025).
Google Scholar
Mikhael, P. G. et al. Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography. J. Clin. Oncol. 41, 2191–2200 (2023).
Google Scholar
Kanakarajan, H. et al. Predicting overall survival of NSCLC patients with clinical, radiomics and deep learning features. Preprint at medRxiv https://doi.org/10.1101/2025.06.13.25329594 (2025).
Zhang, Y. et al. Histopathology images-based deep learning prediction of prognosis and therapeutic response in small cell lung cancer. npj Digit. Med. 7, 15 (2024).
Google Scholar
Salmanpour, M. R. et al. Enhanced lung cancer survival prediction using semi-supervised pseudo-labeling and learning from diverse PET/CT datasets. Cancers 17, 285 (2025).
Liao, C.-Y. et al. Personalized prediction of immunotherapy response in lung cancer patients using advanced radiomics and deep learning. Cancer Imaging 24, 129 (2024).
Google Scholar
Amaro, M., Oliveira, H. P. & Pereira, T. CNN-based methods for survival prediction using CT images for lung cancer patients. In Proc. 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS) 290–296 (IEEE, 2024).
Aggarwal, R. et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. npj Digit. Med. 4, 65 (2021).
Google Scholar
Sui, M. et al. Deep learning-based channel squeeze U-structure for lung nodule detection and segmentation. In Proc. 2024 5th International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE) 634–638 (IEEE, 2024).
RANI, S. R. & Gunasundari, R. Enhanced transformer-based deep kernel fused self attention model for lung nodule segmentation and classification. Arch. Tech. Sci. 31, 175–191 (2024).
Google Scholar
Carles, M. et al. Development and evaluation of two open-source nnu-net models for automatic segmentation of lung tumors on PET and CT images with and without respiratory motion compensation. Eur. Radiol. 34, 6701–6711 (2024).
Google Scholar
Gong, A., Daly, M., Goldin, J., Brown, M., McNitt-Gray, M. & Ruchalski, K. New Lung Lesions in Low-dose CT: a newly annotated longitudinal dataset derived from the National Lung Screening Trial Dataset (NLST-New-lesion-LongCT) Version 1. The Cancer Imaging Archive https://doi.org/10.7937/eyvh-ag54 (2025).
Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. Neural ordinary differential equations. Preprint at https://doi.org/10.48550/arXiv.1806.07366 (2019).
Liu, Z. et al. Swin Transformer: hierarchical vision transformer using shifted windows. Preprint at https://doi.org/10.48550/arXiv.2103.14030 (2021).
Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019).
Google Scholar
Liu, C. et al. ImageFlowNet: forecasting multiscale image-level trajectories of disease progression with irregularly-sampled longitudinal medical images. In Proc. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (IEEE, 2025).
Abi Nader, C. et al. Simulating the outcome of amyloid treatments in alzheimer’s disease from imaging and clinical data. Brain Commun. 3, fcab091 (2021).
Google Scholar
Rubanova, Y., Chen, R. T. Q. & Duvenaud, D. K. Latent ODEs for irregularly-sampled time series. Preprint at https://doi.org/10.48550/arXiv.1907.03907 (2019).
Dormand, J. R. & Prince, P. J. Runge-Kutta triples. Comput. Math. Appl. 12, 1007–1017 (1986).
Google Scholar
Armato III, S. et al. Data from LIDC-IDRI. The Cancer Imaging Archive. (2015).
Li, P. et al. A large-scale CT and PET/CT dataset for lung cancer diagnosis (Lung-PET-CT-Dx). https://doi.org/10.7937/TCIA.2020.NNC2-0461 (2020).
Kinahan, P., Muzi, M., Bialecki, B., Herman, B. & Coombs, L. Data from the ACRI 6668 trial NSCLC-FDG-PET (Version 2). The Cancer Imaging Archive (2019).
Zhou, D., Xu, H., Liu, W. & Liu, F. LN-DETR: cross-scale feature fusion and re-weighting for lung nodule detection. Sci. Rep. 15, 15543 (2025).
Google Scholar
Santone, A., Mercaldo, F. & Brunese, L. A method for real-time lung nodule instance segmentation using deep learning. Life 14, 1192 (2024).
Google Scholar
Kuppusamy, P., Kosalendra, E., Krishnamoorthi, K., Diwakaran, S. & Vijayakumari, P. Detection of lung nodule using novel deep learning algorithm based on computed tomographic images. In Proc. 2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM) 1–7 (IEEE, 2023).
Li, Y. & Fan, Y. DeepSEED: 3D squeeze-and-excitation encoder-decoder convolutional neural networks for pulmonary nodule detection. In Proc. 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) 1866–1869 (IEEE, 2020).
Wu, Y. et al. S3TU-NET: structured convolution and superpixel transformer for lung nodule segmentation. Preprint at https://doi.org/10.48550/arXiv.2411.12547 (2024).
Zhang, J. et al. Detection-guided deep learning-based model with spatial regularization for lung nodule segmentation. Quant. Imaging Med. Surg. 15, 4204 (2025).
Google Scholar
Li, A.-H.A. et al. Lung Nodule Analysis in CT Images: Deep Learning for Segmentation and Measurement. In Proc. 2024 8th International Conference on Medical and Health Informatics 13–17 (Association for Computing Machinery, 2024).
Mao, J. et al. MS-TFGNet: a dual-branch transformer for guided learning in lung cancer lesion segmentation using PET/CT images. In Proc. 2024 5th International Conference on Machine Learning and Computer Application (ICMLCA) 290–295 (IEEE, 2024).
Ramezani, H., Aleman, D. & Létourneau, D. Lung-DETR: deformable detection transformer for sparse lung nodule anomaly detection. Preprint at https://doi.org/10.48550/arXiv.2409.05200 (2024).
Li, C., Deng, J., Ye, C., Yan, Y. & Wen, G. Real-time detection transformer nodule: an improved real-time detection transformer algorithm for lung nodule detection in computed tomography images. J. Electron. Imaging 34, 033025–033025 (2025).
Google Scholar
Liu, W. & He, J. A parallel fusion of CNN and transformer for CT lung nodule detection. In Proc. 2024 4th International Conference on Electronic Information Engineering and Computer Science (EIECS) 329–333 (IEEE, 2024).
Cui, F., Li, Y., Luo, H., Zhang, C. & Du, H. SF2T: leveraging swin transformer and two-stream networks for lung nodule detection. Biomed. Signal Process. Control 95, 106389 (2024).
Google Scholar
Ma, L., Li, G., Feng, X., Fan, Q. & Liu, L. TiCNet: transformer in convolutional neural network for pulmonary nodule detection on CT images. J. Imaging Inform. Med. 37, 196–208 (2024).
Google Scholar
Li, T., Nie, Y. & Yan, H. MSF-YOLO: an improved YOLOv10 network for object detection on lung nodule. In Proc. 2025 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD) 322–327 (IEEE, 2025).
Li, T., Nie, Y. & Li, H. MRFNet: pulmonary nodule detection enhanced by multi-strategy YOLO-based approach. In Proc. 2025 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD) 1356–1361 (IEEE, 2025).
Bappi, I., Richter, D. J., Kolekar, S. S. & Kim, K. HCLmNet: a unified hybrid continual learning strategy multimodal network for lung cancer survival prediction. Preprint at medRxiv https://doi.org/10.1101/2024.12.14.24319041 (2024).
Leung, K. H. et al. Deep semisupervised transfer learning for fully automated whole-body tumor quantification and prognosis of cancer on PET/CT. J. Nucl. Med. 65, 643–650 (2024).
Google Scholar
She, Y. et al. Development and validation of a deep learning model for non-small cell lung cancer survival. JAMA Netw. Open 3, e205842–e205842 (2020).
Google Scholar
Hsu, J. C. et al. Development and validation of novel deep-learning models using multiple data types for lung cancer survival. Cancers 14, 5562 (2022).
Google Scholar
Esha, J. F. et al. Multi-view soft attention-based model for the classification of lung cancer-associated disabilities. Diagnostics 14, https://www.mdpi.com/2075-4418/14/20/2282 (2024).
Chang, H.-H., Wu, C.-Z. & Gallogly, A. Pulmonary nodule classification using a multiview residual selective kernel network. J. Imaging Inform. Med. 37, 347–362 (2024).
Ma, Y. et al. Benign-malignant classification of pulmonary nodules in CT images based on fractal spectrum analysis. Preprint at medRxiv https://doi.org/10.1101/2025.08.24.25334331 (2025).
Faizi, M. K. et al. Deep learning-based lung cancer classification of CT images. BMC Cancer 25, 1056 (2025).
Google Scholar
Tang, H., Zhang, C. & Xie, X. NoduleNet: decoupled false positive reduction for pulmonary nodule detection and segmentation. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention 266–274 (Springer, 2019).

Download references

Acknowledgements

This work is supported by the Yulin City Science and Technology Plan Project(2024-SF-024).

Author information

Authors and Affiliations

Department of Thoracic Surgery, The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an, 710004, China
Danwen Zhao, Xun Guo, Jintao Chai, Zhengshui Xu, Liang Li, Yan Xue & Shiyuan Liu
Department of Thoracic Surgery, First Hospital of Yulin City, Yulin, China
Junfeng Xi
Department of Anesthesiology, Jiangwan Hospital of Hongkou District, Shanghai, China
Qingyu Sun
Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People’s Hospital, School of Medicine, Tongji University, Shanghai, China
Yinggang Zheng
Key Laboratory of Surgery Critical Care and Life Support (Xi’an Jiaotong University), Ministry of Education, Xi’an, China
Shiyuan Liu

Authors

Danwen Zhao
View author publications
Search author on:PubMed Google Scholar
Junfeng Xi
View author publications
Search author on:PubMed Google Scholar
Xun Guo
View author publications
Search author on:PubMed Google Scholar
Jintao Chai
View author publications
Search author on:PubMed Google Scholar
Zhengshui Xu
View author publications
Search author on:PubMed Google Scholar
Liang Li
View author publications
Search author on:PubMed Google Scholar
Yan Xue
View author publications
Search author on:PubMed Google Scholar
Qingyu Sun
View author publications
Search author on:PubMed Google Scholar
Yinggang Zheng
View author publications
Search author on:PubMed Google Scholar
Shiyuan Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

D.Z. and J.X. conceptualized the study, designed the methodology, and participated in securing research funding (conceptualization, methodology, funding acquisition). X.G., J.C. and Y.Z. carried out data acquisition, curation, and investigation (investigation, data curation) and provided key resources, instruments, and technical support (resources, software). Z.X., Y.X., and L.L. drafted the initial manuscript and generated visualizations (writing—original draft, visualization). Y.Z., Q.S., and S.L. supervised the project, coordinated collaborations, and ensured administrative support (supervision, project administration). All authors contributed to reviewing and revising the manuscript critically for important intellectual content (writing—review and editing) and approved the final version for submission.

Corresponding authors

Correspondence to Qingyu Sun, Yinggang Zheng or Shiyuan Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Consent to publish

Not applicable. This work exclusively utilizes de-identified datasets available from public repositories.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, D., Xi, J., Guo, X. et al. Graphicalized vision-language modeling for comprehensive lung nodule analysis and risk stratification. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02602-9

Download citation

Received: 11 November 2025
Accepted: 24 March 2026
Published: 11 April 2026
DOI: https://doi.org/10.1038/s41746-026-02602-9