Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
A high-precision catalogue of landslide events in China based on news text mining with large language model
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 20 March 2026

A high-precision catalogue of landslide events in China based on news text mining with large language model

  • Binru Zhao1,2,3,
  • Lulu Zhang2,
  • Zhenxia Liu1,2,3,4,
  • Wenchao Ma1,2,3,
  • Jian Wang2,
  • Qiang Sun5,
  • Wen Luo1,2,3,
  • Zhaoyuan Yu  ORCID: orcid.org/0000-0003-4225-94351,2,3 &
  • …
  • Linwang Yuan1,2,3 

Scientific Data , Article number:  (2026) Cite this article

  • 1183 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Geography
  • Natural hazards

Abstract

Landslides are a major geological hazard causing significant casualties and economic losses. Reliable risk assessment requires high-quality spatiotemporal event data, yet no publicly available landslide catalogue with fine-grained precision exists for China. To address this, we developed a landslide event catalogue for mainland China from 2008–2024 based on news reports. The dataset was generated via large-scale web crawling, information extraction using an open-source large language model (LLM), event deduplication, geocoding, and multi-stage validation. It contains 1,582 events with detailed spatiotemporal attributes, some with minute-level temporal precision and spatial resolution down to the county, village, or specific reported sites. Evaluation shows that, while casualty-related information is less accurate, the LLM reliably captures key attributes such as time, location, and triggering factors. This demonstrates the feasibility of using LLMs to extract critical landslide data from news reports. Compared with existing catalogues, our dataset offers more events and improved spatiotemporal accuracy, providing a valuable resource for landslide hazard assessment, early warning model development, and disaster risk management in China.

Similar content being viewed by others

Records of shallow landslides triggered by extreme rainfall in July 2024 in Zixing, China

Article Open access 05 August 2025

CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection

Article Open access 02 January 2024

Landslide risk evaluation method of open-pit mine based on numerical simulation of large deformation of landslide

Article Open access 16 September 2023

Data availability

The landslide event catalogue is available on figshare https://doi.org/10.6084/m9.figshare.29603420.

Code availability

The code used in this study is implemented in Python and publicly available at https://doi.org/10.6084/m9.figshare.31298212. It includes scripts for extracting landslide-related information from news reports using large language models, identifying and removing duplicate landslide event records, and performing geocoding to assign spatial coordinates to landslide events.

References

  1. Fidan, S. et al. Understanding fatal landslides at global scales: a summary of topographic, climatic, and anthropogenic perspectives. Nat Hazards 120, 6437–6455 (2024).

    Google Scholar 

  2. Froude, M. J. & Petley, D. N. Global fatal landslide occurrence from 2004 to 2016. Natural Hazards and Earth System Sciences 18, 2161–2181 (2018).

    Google Scholar 

  3. Haque, U. et al. Fatal landslides in Europe. Landslides 13, 1545–1554 (2016).

    Google Scholar 

  4. Clague, J. J. & Stead, D. Landslides: Types, Mechanisms and Modeling. (Cambridge University Press, 2012).

  5. Khatun, M. et al. Landslide Susceptibility Mapping Using Weighted-Overlay Approach in Rangamati, Bangladesh. Earth Syst Environ 7, 223–235 (2023).

    Google Scholar 

  6. Petley, D. N. et al. Trends in landslide occurrence in Nepal. Nat Hazards 43, 23–44 (2007).

    Google Scholar 

  7. Hong, Y., Adler, R. & Huffman, G. Use of satellite remote sensing data in the mapping of global landslide susceptibility. Nat Hazards 43, 245–256 (2007).

    Google Scholar 

  8. Petley, D. Global patterns of loss of life from landslides. Geology 40, 927–930 (2012).

    Google Scholar 

  9. Wang, D. et al. Assessment of landslide susceptibility and risk factors in China. Nat Hazards 108, 3045–3059 (2021).

    Google Scholar 

  10. Fusco, F. et al. A revised landslide inventory of the Campania region (Italy). Sci Data 10, 355 (2023).

    Google Scholar 

  11. Guzzetti, F., Galli, M., Reichenbach, P., Ardizzone, F. & Cardinali, M. Landslide hazard assessment in the Collazzone area, Umbria, Central Italy. Natural Hazards and Earth System Sciences 6, 115–131 (2006).

    Google Scholar 

  12. Westen, C. Jvan, Abella, E. A. C. & Kuriakose, S. L. Spatial data for landslide susceptibility, hazards and vulnerability assessment: an overview. ENG GEOL 102, 112–131 (2008).

    Google Scholar 

  13. Di Napoli, M. et al. Machine learning ensemble modelling as a tool to improve landslide susceptibility mapping reliability. Landslides 17, 1897–1914 (2020).

    Google Scholar 

  14. Guerriero, L. et al. Kinematics and geologic control of the deep-seated landslide affecting the historic center of Buonalbergo, southern Italy. Geomorphology 394, 107961 (2021).

    Google Scholar 

  15. Bozzano, F. et al. Geological and geomorphological analysis of a complex landslides system: the case of San Martino sulla Marruccina (Abruzzo, Central Italy). Journal of Maps 16, 126–136 (2020).

    Google Scholar 

  16. Guzzetti, F., Cardinali, M. & Reichenbach, P. The Influence of Structural Setting and Lithology on Landslide Type and Pattern. Environmental & Engineering Geoscience II, 531–555 (1996).

    Google Scholar 

  17. Lupiano, V., Rago, V., Terranova, O. G. & Iovine, G. Landslide inventory and main geomorphological features affecting slope stability in the Picentino river basin (Campania, southern Italy). Journal of Maps (2019).

  18. Confuorto, P. et al. Intervention model for natural and anthropogenic risk scenarios in the framework of Municipal Emergency Plans. International Journal of Disaster Risk Reduction 58, 102204 (2021).

    Google Scholar 

  19. Malamud, B. D., Turcotte, D. L., Guzzetti, F. & Reichenbach, P. Landslides, earthquakes, and erosion. Earth and Planetary Science Letters 229, 45–59 (2004).

    Google Scholar 

  20. Delforge, D. et al. EM-DAT: the Emergency Events Database. Preprint at https://doi.org/10.21203/rs.3.rs-3807553/v2 (2025).

  21. Santangelo, M., Cardinali, M., Rossi, M., Mondini, A. C. & Guzzetti, F. Remote landslide mapping using a laser rangefinder binocular and GPS. Natural Hazards and Earth System Sciences 10, 2539–2546 (2010).

    Google Scholar 

  22. Bianchini, S. et al. From Picture to Movie: Twenty Years of Ground Deformation Recording Over Tuscany Region (Italy) With Satellite InSAR. Front. Earth Sci. 6 (2018).

  23. McKean, J. & Roering, J. Objective landslide detection and surface morphology mapping using high-resolution airborne laser altimetry. Geomorphology. 57(3-4), 331–351, https://doi.org/10.1016/s0169-555x(03)00164-8 (2004).

    Google Scholar 

  24. Fiorucci, F. et al. Criteria for the optimal selection of remote sensing optical images to map event landslides. Natural Hazards and Earth System Sciences 18, 405–417 (2018).

    Google Scholar 

  25. Santurri, L. et al. Assessment of very high resolution satellite data fusion techniques for landslide recognition (2010).

  26. Kirschbaum, D. B., Adler, R., Hong, Y., Hill, S. & Lerner-Lam, A. A global landslide catalog for hazard applications: method, results, and limitations. Nat Hazards 52, 561–575 (2010).

    Google Scholar 

  27. Vennari, C. et al. Rainfall thresholds for shallow landslide occurrence in Calabria, southern Italy. Natural Hazards and Earth System Sciences 14, 317–330 (2014).

    Google Scholar 

  28. Klimeš, J. et al. Challenges for landslide hazard and risk management in ‘low-risk’ regions, Czech Republic—landslide occurrences and related costs (IPL project no. 197). Landslides 14, 771–780 (2017).

    Google Scholar 

  29. Görüm, T. & Fidan, S. Spatiotemporal variations of fatal landslides in Turkey. Landslides 18, 1691–1705 (2021).

    Google Scholar 

  30. Rosi, A. et al. Landslides in the Mountain Region of Rio de Janeiro: A Proposal for the Semi-Automated Definition of Multiple Rainfall Thresholds. Geosciences 9, 203 (2019).

    Google Scholar 

  31. Dikau, R., Cavallin, A. & Jäger, S. Databases and GIS for landslide research in Europe. Geomorphology 15, 227–239 (1996).

    Google Scholar 

  32. Rosi, A., Segoni, S., Catani, F. & Casagli, N. Statistical and environmental analyses for the definition of a regional rainfall threshold system for landslide triggering in Tuscany (Italy). J. Geogr. Sci. 22, 617–629 (2012).

    Google Scholar 

  33. Rosser, B., Dellow, S., Haubrock, S. & Glassey, P. New Zealand’s National Landslide Database. Landslides 14, 1949–1959 (2017).

    Google Scholar 

  34. Exploring event landslide mapping using Sentinel-1 SAR backscatter products. Geomorphology 397, 108021 (2022).

    Google Scholar 

  35. Fischer, H. W. Response to Disaster: Fact Versus Fiction and Its Perpetuation. (UPA, Lanham, Md. u.a.], 2008).

  36. Goswami, S., Chakraborty, S., Ghosh, S., Chakrabarti, A. & Chakraborty, B. A review on application of data mining techniques to combat natural disasters. Ain Shams Engineering Journal 9, 365–378 (2018).

    Google Scholar 

  37. Franceschini, R., Rosi, A., Catani, F. & Casagli, N. Detecting information from Twitter on landslide hazards in Italy using deep learning models. Geoenvironmental Disasters 11, 22 (2024).

    Google Scholar 

  38. Dogra, V. et al. A Complete Process of Text Classification System Using State-of-the-Art NLP Models. Computational Intelligence and Neuroscience 2022, 1883698 (2022).

    Google Scholar 

  39. Rodrigues, S. G., Silva, M. M. & Alencar, M. H. A proposal for an approach to mapping susceptibility to landslides using natural language processing and machine learning. Landslides 18, 2515–2529 (2021).

    Google Scholar 

  40. Chen, J., Tam, D., Raffel, C., Bansal, M. & Yang, D. An Empirical Survey of Data Augmentation for Limited Data Learning in NLP. Transactions of the Association for Computational Linguistics 11, 191–211 (2023).

    Google Scholar 

  41. Chang, Y. et al. A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol. 15, 39:1–39:45 (2024).

    Google Scholar 

  42. Mohandoss, R. Context-based Semantic Caching for LLM Applications. 2024 IEEE Conference on Artificial Intelligence (CAI) 371–376, https://doi.org/10.1109/CAI59869.2024.00075 (2024).

  43. Hoseini, S. et al. Challenges and Opportunities of LLM-Augmented Semantic Model Creation for Dataspaces. In The Semantic Web: ESWC 2024 Satellite Events (eds. Meroño Peñuela, A. et al.) 183–200, https://doi.org/10.1007/978-3-031-78955-7_17 (Springer Nature Switzerland, Cham, 2025).

  44. Wang, S., He, J., Ma, R., Cheng, Z. & Ding, H. A Comprehensive Vector Dataset of Bus Networks Across China for the Year 2024. Sci Data 12, 524 (2025).

    Google Scholar 

  45. Zhao, B. et al. A high-precision catalogue of landslide events in China based on news text mining with large language model. figshare https://doi.org/10.6084/m9.figshare.29603420 (2026).

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 42571090) and the Natural Science Foundation of Higher Education Institutions of Jiangsu Province (Grant No. 25KJB170011).

Author information

Authors and Affiliations

  1. State Key Laboratory of Climate System Prediction and Risk Management, Nanjing Normal University, Nanjing, 210023, China

    Binru Zhao, Zhenxia Liu, Wenchao Ma, Wen Luo, Zhaoyuan Yu & Linwang Yuan

  2. School of Geography, Nanjing Normal University, Nanjing, 210023, China

    Binru Zhao, Lulu Zhang, Zhenxia Liu, Wenchao Ma, Jian Wang, Wen Luo, Zhaoyuan Yu & Linwang Yuan

  3. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, 210023, China

    Binru Zhao, Zhenxia Liu, Wenchao Ma, Wen Luo, Zhaoyuan Yu & Linwang Yuan

  4. School of Environment, Nanjing Normal University, Nanjing, 210023, China

    Zhenxia Liu

  5. Nanjing Center, China Geological Survey, Nanjing, 210016, China

    Qiang Sun

Authors
  1. Binru Zhao
    View author publications

    Search author on:PubMed Google Scholar

  2. Lulu Zhang
    View author publications

    Search author on:PubMed Google Scholar

  3. Zhenxia Liu
    View author publications

    Search author on:PubMed Google Scholar

  4. Wenchao Ma
    View author publications

    Search author on:PubMed Google Scholar

  5. Jian Wang
    View author publications

    Search author on:PubMed Google Scholar

  6. Qiang Sun
    View author publications

    Search author on:PubMed Google Scholar

  7. Wen Luo
    View author publications

    Search author on:PubMed Google Scholar

  8. Zhaoyuan Yu
    View author publications

    Search author on:PubMed Google Scholar

  9. Linwang Yuan
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Binru Zhao led the design of the catalogue structure, implemented the data processing workflow, and reviewed and revised the manuscript prior to final submission. Zhenxia Liu contributed to large language model–based information extraction, record processing and revisions. Lulu Zhang contributed to data collection, data analysis and the initial drafting of the manuscript. Wenchao Ma and Jian Wang contributed to data curation through manual review and revision of the extracted landslide records prior to catalogue finalization. Qiang Sun provided independent validation data for quality assessment of the catalogue. Wen Luo, Zhaoyuan Yu, and Linwang Yuan provided overall supervision and guidance on dataset design.

Corresponding author

Correspondence to Zhenxia Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, B., Zhang, L., Liu, Z. et al. A high-precision catalogue of landslide events in China based on news text mining with large language model. Sci Data (2026). https://doi.org/10.1038/s41597-026-07066-w

Download citation

  • Received: 22 July 2025

  • Accepted: 10 March 2026

  • Published: 20 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-07066-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing