Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
CLWD: a Chinese histopathology dataset for lung adenocarcinoma subtype classification
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 05 March 2026

CLWD: a Chinese histopathology dataset for lung adenocarcinoma subtype classification

  • Yang Chen  ORCID: orcid.org/0000-0002-3547-57731 na1,
  • Haoyun Zhao2 na1,
  • Li Wang1 na1,
  • Li Li3 na1,
  • Rongsheng Liu3,
  • Yinghan Jiang1,
  • Peiren Tang1,
  • Ying Li1,
  • Jun Ni2,
  • Dapeng Tao2,
  • Jie Li4 &
  • …
  • Jun Peng3 

Scientific Data , Article number:  (2026) Cite this article

  • 934 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Data publication and archiving
  • Non-small-cell lung cancer

Abstract

Effective diagnosis and treatment of lung adenocarcinoma depends on accurate typing, subtyping, and grading. Herein, we present the CLWD dataset, a valuable resource for the lung cancer pathology community, comprising 408 whole-slide images (WSIs) from 210 patients specifically curated for the study of lung adenocarcinoma subtypes. Scanned at 80 × magnification, it is one of the largest datasets in Asia, with a particular emphasis on Chinese patient demographics. Notably, the dataset includes comprehensive clinical information, such as age, sex, and diagnosis, providing a robust foundation for diverse research needs. Publicly accessible, it supports a range of applications, including machine learning model development and validation. An initial evaluation of lung adenocarcinoma subtype classification using a multi-instance learning framework demonstrated that this dataset can substantially advance global research and improve the accuracy of subtype diagnosis.

Similar content being viewed by others

LungHist700: A dataset of histological images for deep learning in pulmonary pathology

Article Open access 05 October 2024

A deep learning model for the classification of indeterminate lung carcinoma in biopsy whole slide images

Article Open access 14 April 2021

Clinical validation of lightweight CNN architectures for reliable multi-class classification of lung cancer using histopathological imaging techniques

Article Open access 28 January 2026

Data availability

The dataset is publicly available via Figshare24 and can also be accessed directly through our Pathology Image Repository (https://leelab.kmmu.edu.cn/PathologyRepository). Otherwise, the JPG version of the dataset also available at the Hugging Face repository (https://huggingface.co/datasets/kmmuleelab/Lung_Pathology_Image_JPG).

Code availability

The code for preprocessing and deep learning models is publicly available on GitHub: https://github.com/DrNeilChen/CLWD.

References

  1. Lortet-Tieulent, J. et al. International trends in lung cancer incidence by histological subtype: adenocarcinoma stabilizing in men but still increasing in women. Lung Cancer 84, 13–22 (2014).

    Google Scholar 

  2. Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 71, 209–249 (2021).

    Google Scholar 

  3. Travis, W. D. et al. The 2015 World Health Organization Classification of Lung Tumors: Impact of Genetic, Clinical and Radiologic Advances Since the 2004 Classification. J Thorac Oncol 10, 1243–1260 (2015).

    Google Scholar 

  4. Travis, W. D. et al. International association for the study of lung cancer/american thoracic society/european respiratory society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol 6, 244–285 (2011).

    Google Scholar 

  5. Xiang, C. et al. Distinct mutational features across preinvasive and invasive subtypes identified through comprehensive profiling of surgically resected lung adenocarcinoma. Mod Pathol 35, 1181–1192 (2022).

    Google Scholar 

  6. Caso, R. et al. The Underlying Tumor Genomics of Predominant Histologic Subtypes in Lung Adenocarcinoma. J Thorac Oncol 15, 1844–1856 (2020).

    Google Scholar 

  7. Zhang, Y. et al. Excellent Prognosis of Patients With Invasive Lung Adenocarcinomas During Surgery Misdiagnosed as Atypical Adenomatous Hyperplasia, Adenocarcinoma In Situ, or Minimally Invasive Adenocarcinoma by Frozen Section. Chest 159, 1265–1272 (2021).

    Google Scholar 

  8. Zhai, W. et al. Prognostic Nomograms Based on Ground Glass Opacity and Subtype of Lung Adenocarcinoma for Patients with Pathological Stage IA Lung Adenocarcinoma. Front Cell Dev Biol 9, 769881 (2021).

    Google Scholar 

  9. Cancer Genome Atlas Research, N. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).

    Google Scholar 

  10. Chen, F. et al. Moving pan-cancer studies from basic research toward the clinic. Nat Cancer 2, 879–890 (2021).

    Google Scholar 

  11. Shmatko, A., Ghaffari Laleh, N., Gerstung, M. & Kather, J. N. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat Cancer 3, 1026–1038 (2022).

    Google Scholar 

  12. Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. J Pathol Inform 7, 29 (2016).

    Google Scholar 

  13. Ozkan, T. A. et al. Interobserver variability in Gleason histological grading of prostate cancer. Scand J Urol 50, 420–424 (2016).

    Google Scholar 

  14. Elmore, J. G. et al. Diagnostic concordance among pathologists interpreting breast biopsy specimens. JAMA 313, 1122–1132 (2015).

    Google Scholar 

  15. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med 25, 1301–1309 (2019).

    Google Scholar 

  16. Gehrung, M. et al. Triage-driven diagnosis of Barrett’s esophagus for early detection of esophageal adenocarcinoma using deep learning. Nat Med 27, 833–841 (2021).

    Google Scholar 

  17. Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 5, 555–570 (2021).

    Google Scholar 

  18. Yang, H. et al. Deep learning-based six-type classifier for lung cancer and mimics from histopathological whole slide images: a retrospective study. BMC Med 19, 80 (2021).

    Google Scholar 

  19. Gertych, A. et al. Convolutional neural networks can accurately distinguish four histologic growth patterns of lung adenocarcinoma in digital slides. Sci Rep 9, 1483 (2019).

    Google Scholar 

  20. Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).

    Google Scholar 

  21. Wei, J. W. et al. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Sci Rep 9, 3358 (2019).

    Google Scholar 

  22. Shao, Z. et al. TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication. in Neural Information Processing Systems (2021).

  23. Zheng, Y. et al. A Graph-Transformer for Whole Slide Image Classification. IEEE Trans Med Imaging 41, 3003–3015 (2022).

    Google Scholar 

  24. Chen, Y. CLWD: a Chinese histopathology dataset for lung adenocarcinoma subtype classification. figshare https://doi.org/10.6084/m9.figshare.29035847 (2025).

  25. Chen, Y. et al. Lung_Pathology_Image_JPG (Revision 312c831). Hugging Face https://doi.org/10.57967/hf/7794 (2026).

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (No. 82560572, No. 82404091, and No. 62302429), the Yunnan Province Applied Basic Research Program Kunming Medical University Joint Project (202401AY070001-120), the Health Commission Foundation of Yunnan Province (2023-KHRCBZ-B15), the Kunming University of Science and Technology Joint Medical Project (KUST-KH2023013Y), Yunnan Fundamental Research Projects(202501CF070023), Kunming University of Science and Technology Joint Medical Project (KUST-KH2022018Y), Major Science and Technology Projects of Yunnan Province (202402AA310016), Basic Research Science and Technology Foundation of Yunnan Province (202201AS070009) and Xing Dian Foundation of Yunnan Province (XDYC-MY-2022-0029).

Author information

Author notes
  1. These authors contributed equally: Yang Chen, Haoyun Zhao, Li Wang, Li Li.

Authors and Affiliations

  1. Department of Pathology, The First People’s Hospital of Yunnan Province/The Affiliated Hospital of Kunming University of Science and Technology, Kunming, 650032, Yunnan, China

    Yang Chen, Li Wang, Yinghan Jiang, Peiren Tang & Ying Li

  2. Department of Information Science & Engineering, Yunnan University, Kunming, 650032, Yunnan, China

    Haoyun Zhao, Jun Ni & Dapeng Tao

  3. Department of Surgery, The First People’s Hospital of Yunnan Province/The Affiliated Hospital of Kunming University of Science and Technology, Kunming, 650032, Yunnan, P.R. China

    Li Li, Rongsheng Liu & Jun Peng

  4. Academy of Biomedical Engineering, Kunming Medical University, Kunming, 650500, Yunnan, China

    Jie Li

Authors
  1. Yang Chen
    View author publications

    Search author on:PubMed Google Scholar

  2. Haoyun Zhao
    View author publications

    Search author on:PubMed Google Scholar

  3. Li Wang
    View author publications

    Search author on:PubMed Google Scholar

  4. Li Li
    View author publications

    Search author on:PubMed Google Scholar

  5. Rongsheng Liu
    View author publications

    Search author on:PubMed Google Scholar

  6. Yinghan Jiang
    View author publications

    Search author on:PubMed Google Scholar

  7. Peiren Tang
    View author publications

    Search author on:PubMed Google Scholar

  8. Ying Li
    View author publications

    Search author on:PubMed Google Scholar

  9. Jun Ni
    View author publications

    Search author on:PubMed Google Scholar

  10. Dapeng Tao
    View author publications

    Search author on:PubMed Google Scholar

  11. Jie Li
    View author publications

    Search author on:PubMed Google Scholar

  12. Jun Peng
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization: J.P., J.L., D.P.T. and J.N.; Methodology and formal analysis: Y.C., H.Y.Z. and J.L.; Investigation: Y.C., H.Y.Z. and L.W.; Data curation: L.W., L.L., R.S.L., Y.H.J., P.R.T. and Y.L.; Writing-Original Draft: Y.C. and H.Y.Z.; Writing-Review & Editing: L.L., J.P., J.L., D.P.T. and J.N.; Supervision: J.P., J.L., D.P.T. and J.N. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Jun Ni, Dapeng Tao, Jie Li or Jun Peng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Figure S1 (download DOCX )

Supplemental Table S1 (download XLSX )

Supplemental Table S2 (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Zhao, H., Wang, L. et al. CLWD: a Chinese histopathology dataset for lung adenocarcinoma subtype classification. Sci Data (2026). https://doi.org/10.1038/s41597-026-06906-z

Download citation

  • Received: 16 May 2025

  • Accepted: 12 February 2026

  • Published: 05 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-06906-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing