Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
Text-image alignment for ILD imaging: linking CXR evidence to CT quantification
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 04 February 2026

Text-image alignment for ILD imaging: linking CXR evidence to CT quantification

  • Jiani Gao1 na1,
  • Yijiu Ren1 na1,
  • Fengjing Yang2 na1,
  • Xuefei Hu1,
  • Changbo Sun1,
  • Sihua Wang2 &
  • …
  • Chang Chen1,3 

npj Digital Medicine , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Diseases
  • Health care
  • Mathematics and computing
  • Medical research

Abstract

Assessment of Interstitial Lung Disease (ILD) relies on chest radiographs (CXR) for screening and computed tomography (CT) for definitive quantification. However, current AI pipelines typically treat these modalities in isolation, leading to report hallucinations and cross-modal inconsistencies. To address this fragmentation, we propose a framework (ARCTIC-ILD) that aligns CXR-derived textual evidence with CT-level segmentation and quantification. The system first employs a calibrated CXR evidence extractor to map radiographs to ILD-specific terminology, producing structured findings. These findings condition a terminology-to-mask module that utilizes lightweight cross-attention adapters to generate lobe-aware CT masks and burden estimates. Crucially, an explicit vision-language audit enforces consistency between the generated text and quantitative data. Evaluations on paired CXR-CT cohorts demonstrate that the framework significantly reduces text hallucination and improves phrase-to-mask alignment without incurring additional inference latency. By coupling reporting with quantification under an auditable protocol, this approach aligns with clinical workflows, serving as a robust assistant for triage, structured reporting, and longitudinal follow-up.

Data availability

All datasets used in this study are publicly accessible from the following official sources: MIMIC-CXR: https://physionet.org/content/mimic-cxr/2.0.0/. MIMIC-CXR-JPG (processed JPG version with standard splits): https://physionet.org/content/mimic-cxr-jpg/2.0.0/. HUG-ILD (HRCT with 3D annotations for interstitial lung disease): https://www.uhbs.ch/en/research/research-infrastructures/hug-ild-database. ReXGroundingCT (3D chest CT with text-finding to voxel mask alignments): https://arxiv.org/abs/2507.22030.

Code availability

All experiments were implemented in Python 3.10 using PyTorch (v2.3) with CUDA 12.1 and cuDNN 9, and were executed on four NVIDIA A100 GPUs with 80 GB memory each under a Linux environment. Medical image input, output operations, and sliding window inference follow MONAI (v1.5.1), and evaluation metrics are computed with TorchMetrics using synchronized reduction on a single device. Mixed precision training relies on the torch.amp autocast and GradScaler utilities, and all optimization, augmentation, and calibration settings are exactly as specified in the Training details section to ensure reproducibility. The full training and inference code, together with configuration files and the random seeds used for all reported runs, will be publicly released on GitHub after formal publication of the paper.

References

  1. Raghu, G. et al. Diagnosis of idiopathic pulmonary fibrosis. An official ATS/ERS/JRS/ALAT Clinical Practice Guideline. Am. J. Respir. Crit. Care Med. 198, e44–e68 (2018).

    Google Scholar 

  2. Chae, K. J. et al. Central role of CT in management of pulmonary fibrosis. Radiographics 44, e230165 (2024).

    Google Scholar 

  3. Christensen, J. D. et al. ACR Appropriateness Criteria® Chronic Dyspnea-noncardiovascular origin: 2024 update. J. Am. Coll. Radiol. 22, S163–S176 (2025).

    Google Scholar 

  4. Hansell, D. M. et al. Fleischner Society: glossary of terms for thoracic imaging. Radiology 246, 697–722 (2008).

    Google Scholar 

  5. Jacob, J. et al. Mortality prediction in idiopathic pulmonary fibrosis: evaluation of computer-based CT analysis with conventional severity measures. Eur. Respir. J. 49, 1601011 (2017).

    Google Scholar 

  6. Yu, F. et al. Evaluating progress in automatic chest X-ray radiology report generation. Patterns 4, 100802 (2023).

    Google Scholar 

  7. Bannur, S. et al. Learning to exploit temporal structure for biomedical vision-language processing. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15016–15027 (IEEE, 2023).

  8. Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. In Proc. Advances in Neural Information Processing Systems 34892–34916 (Curran Associates, Inc., 2023).

  9. Li, C. et al. LLaVA-med: training a large language-and-vision assistant for biomedicine in one day. In Proc. Advances in Neural Information Processing Systems 1240 (Curran Associates Inc., 2023).

  10. Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proc. AAAI Conference on Artificial Intelligence Vol. 33, 590–597 (AAAI, 2019).

  11. Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).

    Google Scholar 

  12. Zou, X. et al. Segment everything everywhere all at once. In Proc. Advances in Neural Information Processing Systems 868 (Curran Associates Inc., 2023).

  13. Lüddecke, T. & Ecker, A. Image segmentation using text and image prompts. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 7086–7096 (IEEE, 2022).

  14. Rao, Y. et al. DenseCLIP: language-guided dense prediction with context-aware prompting. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 18082–18091 (IEEE, 2022).

  15. Ryu, J. S., Kang, H., Chu, Y. & Yang, S. Vision-language foundation models for medical imaging: a review of current practices and innovations. Biomed. Eng. Lett. 15, 809–830 (2025).

    Google Scholar 

  16. Wasserthal, J. et al. TotalSegmentator: robust segmentation of 104 anatomic structures in CT images. Radiol. Artif. Intell. 5, e230024 (2023).

    Google Scholar 

  17. Podolanczuk, A. J. et al. Approach to the evaluation and management of interstitial lung abnormalities: an official American Thoracic Society clinical statement. Am. J. Respir. Crit. Care Med. 211, 1132–1155 (2025).

    Google Scholar 

  18. Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks. In Proc. International Conference on Machine Learning 1321–1330 (PMLR, 2017).

  19. Hu, E. J. et al. LoRA: Low-rank adaptation of large language models. In Proc. International Conference on Learning Representations (International Conference on Learning Representations, 2022).

  20. Johnson, A. E. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019).

    Google Scholar 

  21. Johnson, A. E. et al. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. Preprint at arXiv https://doi.org/10.48550/arXiv.1901.07042 (2019).

  22. Taha, A. A. & Hanbury, A. Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med. Imaging 15, 29 (2015).

    Google Scholar 

  23. Antonelli, M. et al. The medical segmentation decathlon. Nat. Commun. 13, 4128 (2022).

    Google Scholar 

  24. Baharoon, M. et al. ReXGroundingCT: a 3D chest CT dataset for segmentation of findings from free-text reports. Preprint at arXiv https://doi.org/10.48550/arXiv.2507.22030 (2025).

  25. Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In Proc. International Conference on Learning Representations (International Conference on Learning Representations, 2019).

  26. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Proc. Advances in Neural Information Processing Systems Vol. 33, 6840–6851 (2020).

  27. Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. In Proc. International Conference on Learning Representations (OpenReview.net, 2021).

  28. Chen, Z., Song, Y., Chang, T.-H. & Wan, X. Generating radiology reports via memory-driven transformer. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 1439–1449 (Association for Computational Linguistics, 2020).

  29. Post, M. A call for clarity in reporting BLEU scores. In Proc. Third Conference on Machine Translation: Research Papers 186–191 (Association for Computational Linguistics, 2018).

  30. Banerjee, S. & Lavie, A. METEOR: an automatic metric for mt evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization 65–72 (Association for Computational Linguistics, 2005).

  31. Lin, C.-Y. ROUGE: a package for automatic evaluation of summaries. in Text Summarization Branches Out 74–81 (Association for Computational Linguistics, 2004).

  32. Vinyals, O., Toshev, A., Bengio, S. & Erhan, D. Show and tell: a neural image caption generator. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 3156–3164 (IEEE, 2015).

  33. Xu, K. et al. Show, attend and tell: neural image caption generation with visual attention. In Proc. International Conference on Machine Learning 2048–2057 (PMLR, 2015).

  34. Lu, J., Xiong, C., Parikh, D. & Socher, R. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 375–383 (IEEE, 2017).

  35. Anderson, P. et al. Bottom-up and top-down attention for image captioning and visual question answering. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 6077–6086 (IEEE, 2018).

  36. Chen, Z., Shen, Y., Song, Y. & Wan, X. Cross-modal memory networks for radiology report generation. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 5904–5914 (Association for Computational Linguistics, 2021).

  37. Liu, F., Wu, X., Ge, S., Fan, W. & Zou, Y. Exploring and distilling posterior and prior knowledge for radiology report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 13753–13762 (IEEE, 2021).

  38. You, D. et al. AlignTransformer: hierarchical alignment of visual regions and disease tags for medical report generation. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention 72–82 (Springer, 2021).

  39. Yang, S., Wu, X., Ge, S., Zhou, S. K. & Xiao, L. Knowledge matters: chest radiology report generation with general and specific knowledge. Med. Image Anal. 80, 102510 (2022).

    Google Scholar 

  40. Wang, Z., Liu, L., Wang, L. & Zhou, L. METransformer: radiology report generation by transformer with multiple learnable expert tokens. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 11558–11567 (IEEE, 2023).

  41. Huang, Z., Zhang, X. & Zhang, S. KiUT: knowledge-injected u-transformer for radiology report generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 19809–19818 (IEEE, 2023).

  42. Nicolson, A., Dowling, J. & Koopman, B. Improving chest X-ray report generation by leveraging warm starting. Artif. Intell. Med. 144, 102633 (2023).

    Google Scholar 

  43. Bu, S., Song, Y., Li, T. & Dai, Z. Dynamic knowledge prompt for chest X-ray report generation. In Proc. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) 5425–5436 (ELRA and ICCL, 2024).

  44. Smit, A. et al. Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 1500–1519 (Association for Computational Linguistics, 2020).

  45. Sharma, H. et al. MAIRA-Seg: enhancing radiology report generation with segmentation-aware multimodal large language models. In Proc. 4th Machine Learning for Health Symposium 941–960 (PMLR, 2025).

  46. Srivastav, S. et al. MAIRA at RRG24: a specialised large multimodal model for radiology report generation. In Proc. 23rd Workshop on Biomedical Natural Language Processing 597–602 (Association for Computational Linguistics, 2024).

  47. Tanno, R. et al. Collaboration between clinicians and vision—language models in radiology report generation. Nat. Med. 31, 599–608 (2025).

    Google Scholar 

  48. Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention 234–241 (Springer, 2015).

  49. Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N. & Liang, J. UNet++: a nested U-Net architecture for medical image segmentation. In Proc. International Workshop on Deep Learning in Medical Image Analysis 3–11 (Springer, 2018).

  50. Isensee, F., Jaeger, P. F. & Kohl, S. A. A. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18, 203–211 (2021).

    Google Scholar 

  51. Hatamizadeh, A. et al. UNETR: transformers for 3D medical image segmentation. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 1748–1758 (IEEE, 2022).

  52. Zhang D. et al. Improving Medical X-ray Report Generation by Using Knowledge Graph. Appl. Sci. 12, 11111 (2022).

  53. Sloan, P., Clatworthy, P., Simpson, E. & Mirmehdi, M. Automated radiology report generation: a review of recent advances. Rev. Biomed. Eng. 18, 368–387 (2025).

    Google Scholar 

Download references

Acknowledgements

This study was supported by the Noncommunicable Chronic Diseases-National Science and Technology Major Project (2024ZD0529006).

Author information

Author notes
  1. These authors contributed equally: Jiani Gao, Yijiu Ren, Fengjing Yang.

Authors and Affiliations

  1. Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, PR China

    Jiani Gao, Yijiu Ren, Xuefei Hu, Changbo Sun & Chang Chen

  2. Department of Thoracic Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, PR China

    Fengjing Yang & Sihua Wang

  3. Clinical Center for Thoracic Surgery Research, Tongji University, Shanghai, PR China

    Chang Chen

Authors
  1. Jiani Gao
    View author publications

    Search author on:PubMed Google Scholar

  2. Yijiu Ren
    View author publications

    Search author on:PubMed Google Scholar

  3. Fengjing Yang
    View author publications

    Search author on:PubMed Google Scholar

  4. Xuefei Hu
    View author publications

    Search author on:PubMed Google Scholar

  5. Changbo Sun
    View author publications

    Search author on:PubMed Google Scholar

  6. Sihua Wang
    View author publications

    Search author on:PubMed Google Scholar

  7. Chang Chen
    View author publications

    Search author on:PubMed Google Scholar

Contributions

J.G., Y.R., and F.Y. contributed equally to this work, having full access to all study data and assuming responsibility for the integrity and accuracy of the analyses (validation, formal analysis). J.G. conceptualized the study, designed the methodology, and participated in securing research funding (conceptualization, methodology, funding acquisition). Y.R. carried out data acquisition, curation, and investigation (investigation, data curation) and provided key resources, instruments, and technical support (resources, software). F.Y. drafted the initial manuscript and generated visualizations (writing—original draft, visualization). C.S., S.W., X.H., and C.C. supervised the project, coordinated collaborations, and ensured administrative support (supervision, project administration). All authors contributed to reviewing and revising the manuscript critically for important intellectual content (writing—review and editing) and approved the final version for submission.

Corresponding authors

Correspondence to Xuefei Hu, Changbo Sun, Sihua Wang or Chang Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Consent for publication

Not applicable. This work exclusively utilizes de-identified datasets available from public repositories.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, J., Ren, Y., Yang, F. et al. Text-image alignment for ILD imaging: linking CXR evidence to CT quantification. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-025-02292-9

Download citation

  • Received: 02 November 2025

  • Accepted: 16 December 2025

  • Published: 04 February 2026

  • DOI: https://doi.org/10.1038/s41746-025-02292-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Emerging Applications of Machine Learning and AI for Predictive Modeling in Precision Medicine

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics