Abstract
Lung cancer care involves coupled tasks such as precise nodule detection, patient-level survival risk estimation, and nodule count quantification, typically handled by separate systems despite clear interdependence. We present VITALIS, a multimodal vision-language framework that fuses CT and PET/CT imaging with structured radiology text using a graph-aware Transformer: Laplacian diffusion enriches token features on an image-text graph, while structural and prior-guided attention focus computation on anatomically and clinically related contexts, followed by bidirectional image-text conditioning to form a fused patient representation. This representation parameterizes a continuous-time latent risk process governed by a context-modulated Neural ODE, enabling individualized continuous-time modeling of time-to-event risk. Task-specific heads decode the latent trajectory into nodule detection, nodule malignancy classification, survival risk estimation, and nodule count prediction. Evaluated on three public cohorts, the framework delivers accurate delineations, low-false-positive localization, calibrated survival risk estimates, and consistent nodule counts across tasks. These findings indicate that coupling graph-aware multimodal encoding with continuous-time latent dynamics provides a coherent basis for integrated diagnostic and prognostic modeling in lung cancer.
Similar content being viewed by others
Data availability
All datasets used in this study are publicly accessible: LIDC-IDRI: https://www.cancerimagingarchive.net/collection/lidc-idri/; LUNG-PET-CT-DX: https://www.cancerimagingarchive.net/collection/lung-pet-ct-dx/; ACRIN-NSCLC-FDG-PET (ACRIN 6668): https://www.cancerimagingarchive.net/collection/acrin-nsclc-fdg-pet/; NLST New-lesion LongCT: https://www.cancerimagingarchive.net/analysis-result/nlst-new-lesion-longct/.
Code availability
The source code supporting the findings of this study, including the VITALIS model architecture, training protocols, and inference scripts, is available for review at https://anonymous.4open.science/r/VITALIS-5E6A. The repository contains the complete implementation of the graph-aware multimodal fusion, neural ODE dynamics, and the multi-task learning framework described in the Methods section.
References
Zhou, L., Wu, C., Chen, Y. & Zhang, Z. Multitask connected U-Net: automatic lung cancer segmentation from CT images using pet knowledge guidance. Front. Artif. Intell. 7, 1423535 (2024).
Elkefi, S. et al. Systematic review on the technology’s role in supporting lung cancer patients in the treatment journey. npj Digit. Med. 8, 516 (2025).
Niu, C. et al. Medical multimodal multitask foundation model for lung cancer screening. Nat. Commun. 16, 1523 (2025).
Cai, G. et al. MSDet: receptive field enhanced multiscale detection for tiny pulmonary nodule. Preprint at arXiv https://doi.org/10.48550/arXiv.2409.14028 (2024).
Tang, C., Zhou, F., Sun, J. & Zhang, Y. Circle-YOLO: an anchor-free lung nodule detection algorithm using bounding circle representation. Pattern Recognit. 161, 111294 (2025).
Mikhael, P. G. et al. Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography. J. Clin. Oncol. 41, 2191–2200 (2023).
Kanakarajan, H. et al. Predicting overall survival of NSCLC patients with clinical, radiomics and deep learning features. Preprint at medRxiv https://doi.org/10.1101/2025.06.13.25329594 (2025).
Zhang, Y. et al. Histopathology images-based deep learning prediction of prognosis and therapeutic response in small cell lung cancer. npj Digit. Med. 7, 15 (2024).
Salmanpour, M. R. et al. Enhanced lung cancer survival prediction using semi-supervised pseudo-labeling and learning from diverse PET/CT datasets. Cancers 17, 285 (2025).
Liao, C.-Y. et al. Personalized prediction of immunotherapy response in lung cancer patients using advanced radiomics and deep learning. Cancer Imaging 24, 129 (2024).
Amaro, M., Oliveira, H. P. & Pereira, T. CNN-based methods for survival prediction using CT images for lung cancer patients. In Proc. 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS) 290–296 (IEEE, 2024).
Aggarwal, R. et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. npj Digit. Med. 4, 65 (2021).
Sui, M. et al. Deep learning-based channel squeeze U-structure for lung nodule detection and segmentation. In Proc. 2024 5th International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE) 634–638 (IEEE, 2024).
RANI, S. R. & Gunasundari, R. Enhanced transformer-based deep kernel fused self attention model for lung nodule segmentation and classification. Arch. Tech. Sci. 31, 175–191 (2024).
Carles, M. et al. Development and evaluation of two open-source nnu-net models for automatic segmentation of lung tumors on PET and CT images with and without respiratory motion compensation. Eur. Radiol. 34, 6701–6711 (2024).
Gong, A., Daly, M., Goldin, J., Brown, M., McNitt-Gray, M. & Ruchalski, K. New Lung Lesions in Low-dose CT: a newly annotated longitudinal dataset derived from the National Lung Screening Trial Dataset (NLST-New-lesion-LongCT) Version 1. The Cancer Imaging Archive https://doi.org/10.7937/eyvh-ag54 (2025).
Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. Neural ordinary differential equations. Preprint at https://doi.org/10.48550/arXiv.1806.07366 (2019).
Liu, Z. et al. Swin Transformer: hierarchical vision transformer using shifted windows. Preprint at https://doi.org/10.48550/arXiv.2103.14030 (2021).
Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019).
Liu, C. et al. ImageFlowNet: forecasting multiscale image-level trajectories of disease progression with irregularly-sampled longitudinal medical images. In Proc. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (IEEE, 2025).
Abi Nader, C. et al. Simulating the outcome of amyloid treatments in alzheimer’s disease from imaging and clinical data. Brain Commun. 3, fcab091 (2021).
Rubanova, Y., Chen, R. T. Q. & Duvenaud, D. K. Latent ODEs for irregularly-sampled time series. Preprint at https://doi.org/10.48550/arXiv.1907.03907 (2019).
Dormand, J. R. & Prince, P. J. Runge-Kutta triples. Comput. Math. Appl. 12, 1007–1017 (1986).
Armato III, S. et al. Data from LIDC-IDRI. The Cancer Imaging Archive. (2015).
Li, P. et al. A large-scale CT and PET/CT dataset for lung cancer diagnosis (Lung-PET-CT-Dx). https://doi.org/10.7937/TCIA.2020.NNC2-0461 (2020).
Kinahan, P., Muzi, M., Bialecki, B., Herman, B. & Coombs, L. Data from the ACRI 6668 trial NSCLC-FDG-PET (Version 2). The Cancer Imaging Archive (2019).
Zhou, D., Xu, H., Liu, W. & Liu, F. LN-DETR: cross-scale feature fusion and re-weighting for lung nodule detection. Sci. Rep. 15, 15543 (2025).
Santone, A., Mercaldo, F. & Brunese, L. A method for real-time lung nodule instance segmentation using deep learning. Life 14, 1192 (2024).
Kuppusamy, P., Kosalendra, E., Krishnamoorthi, K., Diwakaran, S. & Vijayakumari, P. Detection of lung nodule using novel deep learning algorithm based on computed tomographic images. In Proc. 2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM) 1–7 (IEEE, 2023).
Li, Y. & Fan, Y. DeepSEED: 3D squeeze-and-excitation encoder-decoder convolutional neural networks for pulmonary nodule detection. In Proc. 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) 1866–1869 (IEEE, 2020).
Wu, Y. et al. S3TU-NET: structured convolution and superpixel transformer for lung nodule segmentation. Preprint at https://doi.org/10.48550/arXiv.2411.12547 (2024).
Zhang, J. et al. Detection-guided deep learning-based model with spatial regularization for lung nodule segmentation. Quant. Imaging Med. Surg. 15, 4204 (2025).
Li, A.-H.A. et al. Lung Nodule Analysis in CT Images: Deep Learning for Segmentation and Measurement. In Proc. 2024 8th International Conference on Medical and Health Informatics 13–17 (Association for Computing Machinery, 2024).
Mao, J. et al. MS-TFGNet: a dual-branch transformer for guided learning in lung cancer lesion segmentation using PET/CT images. In Proc. 2024 5th International Conference on Machine Learning and Computer Application (ICMLCA) 290–295 (IEEE, 2024).
Ramezani, H., Aleman, D. & Létourneau, D. Lung-DETR: deformable detection transformer for sparse lung nodule anomaly detection. Preprint at https://doi.org/10.48550/arXiv.2409.05200 (2024).
Li, C., Deng, J., Ye, C., Yan, Y. & Wen, G. Real-time detection transformer nodule: an improved real-time detection transformer algorithm for lung nodule detection in computed tomography images. J. Electron. Imaging 34, 033025–033025 (2025).
Liu, W. & He, J. A parallel fusion of CNN and transformer for CT lung nodule detection. In Proc. 2024 4th International Conference on Electronic Information Engineering and Computer Science (EIECS) 329–333 (IEEE, 2024).
Cui, F., Li, Y., Luo, H., Zhang, C. & Du, H. SF2T: leveraging swin transformer and two-stream networks for lung nodule detection. Biomed. Signal Process. Control 95, 106389 (2024).
Ma, L., Li, G., Feng, X., Fan, Q. & Liu, L. TiCNet: transformer in convolutional neural network for pulmonary nodule detection on CT images. J. Imaging Inform. Med. 37, 196–208 (2024).
Li, T., Nie, Y. & Yan, H. MSF-YOLO: an improved YOLOv10 network for object detection on lung nodule. In Proc. 2025 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD) 322–327 (IEEE, 2025).
Li, T., Nie, Y. & Li, H. MRFNet: pulmonary nodule detection enhanced by multi-strategy YOLO-based approach. In Proc. 2025 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD) 1356–1361 (IEEE, 2025).
Bappi, I., Richter, D. J., Kolekar, S. S. & Kim, K. HCLmNet: a unified hybrid continual learning strategy multimodal network for lung cancer survival prediction. Preprint at medRxiv https://doi.org/10.1101/2024.12.14.24319041 (2024).
Leung, K. H. et al. Deep semisupervised transfer learning for fully automated whole-body tumor quantification and prognosis of cancer on PET/CT. J. Nucl. Med. 65, 643–650 (2024).
She, Y. et al. Development and validation of a deep learning model for non-small cell lung cancer survival. JAMA Netw. Open 3, e205842–e205842 (2020).
Hsu, J. C. et al. Development and validation of novel deep-learning models using multiple data types for lung cancer survival. Cancers 14, 5562 (2022).
Esha, J. F. et al. Multi-view soft attention-based model for the classification of lung cancer-associated disabilities. Diagnostics 14, https://www.mdpi.com/2075-4418/14/20/2282 (2024).
Chang, H.-H., Wu, C.-Z. & Gallogly, A. Pulmonary nodule classification using a multiview residual selective kernel network. J. Imaging Inform. Med. 37, 347–362 (2024).
Ma, Y. et al. Benign-malignant classification of pulmonary nodules in CT images based on fractal spectrum analysis. Preprint at medRxiv https://doi.org/10.1101/2025.08.24.25334331 (2025).
Faizi, M. K. et al. Deep learning-based lung cancer classification of CT images. BMC Cancer 25, 1056 (2025).
Tang, H., Zhang, C. & Xie, X. NoduleNet: decoupled false positive reduction for pulmonary nodule detection and segmentation. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention 266–274 (Springer, 2019).
Acknowledgements
This work is supported by the Yulin City Science and Technology Plan Project(2024-SF-024).
Author information
Authors and Affiliations
Contributions
D.Z. and J.X. conceptualized the study, designed the methodology, and participated in securing research funding (conceptualization, methodology, funding acquisition). X.G., J.C. and Y.Z. carried out data acquisition, curation, and investigation (investigation, data curation) and provided key resources, instruments, and technical support (resources, software). Z.X., Y.X., and L.L. drafted the initial manuscript and generated visualizations (writing—original draft, visualization). Y.Z., Q.S., and S.L. supervised the project, coordinated collaborations, and ensured administrative support (supervision, project administration). All authors contributed to reviewing and revising the manuscript critically for important intellectual content (writing—review and editing) and approved the final version for submission.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Consent to publish
Not applicable. This work exclusively utilizes de-identified datasets available from public repositories.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhao, D., Xi, J., Guo, X. et al. Graphicalized vision-language modeling for comprehensive lung nodule analysis and risk stratification. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02602-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-026-02602-9


