Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Disaster Storylines and Knowledge Graphs from Global News with Large Language Models and Retrieval-Augmented Generation
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 17 March 2026

Disaster Storylines and Knowledge Graphs from Global News with Large Language Models and Retrieval-Augmented Generation

  • Michele Ronco  ORCID: orcid.org/0000-0002-2160-24521,
  • Luca Bandelli2,
  • Lorenzo Bertolini1,
  • Sergio Consoli  ORCID: orcid.org/0000-0001-7357-58581,
  • Damien Delforge  ORCID: orcid.org/0000-0002-3552-94443,
  • Alessio Spadaro1,
  • Marco Verile1 &
  • …
  • Christina Corbane1 

Scientific Data , Article number:  (2026) Cite this article

  • 2246 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Environmental health
  • Infectious diseases
  • Natural hazards

Abstract

We present a dataset of over 3,000 global disaster events from 2014 to 2024, derived from the Emergency Events Database (EM-DAT). Events are extracted from news using a pipeline combining Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) for semantic extraction. The corpus is the Europe Media Monitor (EMM), aggregating content from millions of news outlets. For each event, structured storylines are automatically generated, summarizing hazard characteristics, drivers, impacts, and responses, and transformed into knowledge graphs. This enables analysis of relationships, inter-hazard dynamics, and human-environment interactions often missed in traditional records. A small subset of knowledge graphs was evaluated by domain experts in a workshop, while a larger sample of extracted triplets was independently assessed to quantify precision and inter-annotator agreement. The dataset supports retrospective analysis and multi-hazard risk assessment, complementing resources like the Hazard Information Profiles (HIPs). All data, code, and workflows are openly available, with an interactive dashboard for exploration. This resource advances data-driven approaches to disaster scenario modeling, impact analysis, and decision support in disaster risk management.

Similar content being viewed by others

Human and economic impacts of natural disasters: can we trust the global data?

Article Open access 16 September 2022

Knowledge graph–large language model fusion approach for emergency knowledge recommendation in gas tunnels

Article Open access 28 February 2026

Addressing the data imbalance issue in machine learning modeling of rare and disruptive outage events

Article Open access 05 March 2026

Data availability

The dataset containing both storylines and KGs is available in CSV format within the Joint Research Centre Data Catalogue at https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/ETOHA/storylines/DisasterStory.csv21. To maximize access and visibility, we also make all data and code available on Zenodo in a single repository, which can be downloaded at https://doi.org/10.5281/zenodo.18598183.

Code availability

All Python code used for data processing, RAG pipeline, and knowledge graph extraction is available at https://github.com/jrcf7/crisesStorylinesRAG. The code for the Gradio application can be found at https://huggingface.co/spaces/roncmic/crisesStorylinesRAG. All data and code are also made available on Zenodo in a single repository, which can be downloaded at https://doi.org/10.5281/zenodo.18598183.

References

  1. Jacot des Combes, H. et al. Hazard definition and classification review: Technical report (2025). United Nations Office for Disaster Risk Reduction https://doi.org/10.24948/2025.05 (2025).

  2. De Angeli, S. et al. A multi-hazard framework for spatial-temporal impact analysis. International Journal of Disaster Risk Reduction 73, 102829, https://doi.org/10.1016/j.ijdrr.2022.102829 (2022).

    Google Scholar 

  3. Gill, J. C. & Malamud, B. D. Reviewing and visualizing the interactions of natural hazards. Reviews of Geophysics 52, 680–722, https://doi.org/10.1002/2013RG000445 (2014).

    Google Scholar 

  4. Šakić Trogrlić, R. et al. Challenges in assessing and managing multi-hazard risks: A european stakeholders perspective. Environmental Science & Policy 157, 103774, https://doi.org/10.1016/j.envsci.2024.103774 (2024).

    Google Scholar 

  5. Thomas, D. S. K., Jang, S. & Scandlyn, J. The chasms conceptual model of cascading disasters and social vulnerability: The covid-19 case example. International Journal of Disaster Risk Reduction 51, 101828, https://doi.org/10.1016/j.ijdrr.2020.101828 (2020).

    Google Scholar 

  6. Tilloy, A., Malamud, B., Winter, H. & Joly-Laugel, A. A review of quantification methodologies for multi-hazard interrelationships. Earth-Science Reviews 196, 102881, https://doi.org/10.1016/j.earscirev.2019.102881 (2019).

    Google Scholar 

  7. Gallina, V. et al. A review of multi-risk methodologies for natural hazards: Consequences and challenges for a climate change impact assessment. Journal of Environmental Management 168, 123–132, https://doi.org/10.1016/j.jenvman.2015.11.011 (2016).

    Google Scholar 

  8. Rokhideh, M., Fearnley, C. & Budimir, M. Multi-hazard early warning systems in the sendai framework for disaster risk reduction: Achievements, gaps, and future directions. International Journal of Disaster Risk Science 16, 103–116, https://doi.org/10.1007/s13753-025-00622-9 (2025).

    Google Scholar 

  9. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, 5999–6009 (2017).

  10. Brown, T. B. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems, vol. 2020-December (2020).

  11. Lei, Z. et al. Harnessing large language models for disaster management: A survey. Findings of the Association for Computational Linguistics: ACL https://doi.org/10.18653/v1/2025.findings-acl.750 (2025).

  12. Xu, F., Ma, J., Li, N. & Cheng, J. C. P. Large language model applications in disaster management: An interdisciplinary review. International Journal of Disaster Risk Reduction 127, 105642, https://doi.org/10.1016/j.ijdrr.2025.105642 (2025).

    Google Scholar 

  13. Jeba, S. M., Aurpa, T. T. & Adib, M. R. S. From facebook posts to news headlines: Using transformer models to predict post-disaster impact on mass media content. Social Network Analysis and Mining 14, 200 (2024).

    Google Scholar 

  14. Hou, J. & Xu, S. Near-real-time seismic human fatality information retrieval from social media with few-shot large-language models. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, 1141–1147 (2022).

  15. Balashankar, A. et al. Predicting food crises using news streams. Science Advances 9, eabm3449, https://doi.org/10.1126/sciadv.abm3449 (2023).

    Google Scholar 

  16. Delforge, D. et al. EM-DAT: The Emergency Events Database. International Journal of Disaster Risk Reduction 124, 105509, https://doi.org/10.1016/j.ijdrr.2025.105509 (2025).

    Google Scholar 

  17. Sodoge, J., Kuhlicke, C., Mahecha, M. D. & de Brito, M. M. Text mining uncovers the unique dynamics of socio-economic impacts of the 2018-2022 multi-year drought in germany. Natural Hazards and Earth System Sciences 24, 1757–1777, https://doi.org/10.5194/nhess-24-1757-2024 (2024).

    Google Scholar 

  18. Alencar, P. H. L., Sodoge, J., Paton, E. N. & de Brito, M. M. Flash droughts and their impacts-using newspaper articles to assess the perceived consequences of rapidly emerging droughts. Environmental Research Letters 19, 074048, https://doi.org/10.1088/1748-9326/ad58fa (2024).

    Google Scholar 

  19. Firmansyah, H. B. et al. Enhancing disaster response with automated text information extraction from social media images. In 2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService), 71–78, https://doi.org/10.1109/BigDataService58306.2023.00017 (2023).

  20. Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33, 9459–9474 (2020).

    Google Scholar 

  21. Ronco, M. et al. crisesStorylinesRAG [Data set]. Zenodo, https://doi.org/10.5281/zenodo.18598183 (2026).

  22. Steinberger, R. et al. EMM: Supporting the analyst by turning multilingual text into structured data. In Transparenz aus Verantwortung: neue Herausforderungen für die digitale Datenanalyse (Erich Schmidt Verlag, 2017).

  23. Ji, S., Pan, S., Cambria, E., Marttinen, P. & Yu, P. S. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications. IEEE Transactions on Neural Networks and Learning Systems 33, https://doi.org/10.1109/TNNLS.2021.3070843 (2022).

  24. Auer, S. et al. Towards a Knowledge Graph for Science. In Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, WIMS ’18 (Association for Computing Machinery, New York, NY, USA, 2018).

  25. Hogan, A. et al. Knowledge graphs. ACM Computing Surveys 54, https://doi.org/10.1145/3447772 (2021).

  26. Tiwari, S., Ortíz-Rodriguez, F., Abbés, S. B., Usip, P. U. & Hantach, R.Semantic AI in Knowledge Graphs (Taylor & Francis, Boca Raton, US, 2023).

  27. Heath, T. & Bizer, C. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology 1, 1–121 (2011).

    Google Scholar 

  28. Steinberger, R., Pouliquen, B. & van der Goot, E. An introduction to the Europe Media Monitor family of applications (2013).

  29. Touvron, H. et al. LLaMA: Open and Efficient Foundation Language Models (2023).

  30. Dubey, A. et al. The Llama 3 Herd of Models (2024).

  31. Yang, J., Han, S. C. & Poon, J. A survey on extraction of causal relations from natural language text. Knowl. Inf. Syst. 64, 1161–1186 (2022).

    Google Scholar 

  32. Yerkhassym, A., Pak, A. A., Akhmetov, I., Yelenov, A. & Gelbukh, A. On causality problem in natural language processing field. Computacion y Sistemas 26, 1549 - 1556 (2022).

    Google Scholar 

  33. Coletta, V. R. et al. Causal loop diagrams for supporting nature based solutions participatory design and performance assessment. Journal of Environmental Management 280, 111668 (2021).

    Google Scholar 

  34. Inam, A., Adamowski, J., Halbe, J. & Prasher, S. Using causal loop diagrams for the initialization of stakeholder engagement in soil salinity management in agricultural watersheds in developing countries: A case study in the rechna doab watershed, pakistan. Journal of Environmental Management 152, 251–267 (2015).

    Google Scholar 

  35. Dong, Q. et al. A survey on in-context learning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing https://doi.org/10.18653/v1/2024.emnlp-main.64 (2024).

  36. Peng, C., Xia, F. & Naseriparsa, M. et al. Knowledge graphs: Opportunities and challenges. Artificial Intelligence Review 56, 13071–13102 (2023).

    Google Scholar 

  37. Consoli, S., Coletti, P. & Markov, P. V. et al. An epidemiological knowledge graph extracted from the world health organization’s disease outbreak news. Scientific Data 12, 970 (2025).

    Google Scholar 

  38. Bertolini, L., Hulsman, R., Consoli, S., Puertas Gallardo, A. & Ceresa, M. On constructing biomedical text-to-graph systems with large language models. In CEUR Workshop Proceedings, vol. 3747 (2024).

  39. Yang, R., Zhu, J., Man, J., Fang, L. & Zhou, Y. Enhancing text-based knowledge graph completion with zero-shot large language models: A focus on semantic enhancement. Knowl. Based Syst. 300, 112155 (2023).

    Google Scholar 

  40. Antonucci, A., Piqué, G. & Zaffalon, M. Zero-shot causal graph extrapolation from text via llms. arXiv preprint (2023).

  41. Yang, R. et al. Graphusion: A RAG Framework for Scientific Knowledge Graph Construction with a Global Perspective. WWW ‘25: The ACM Web Conference. https://dl.acm.org/doi/10.1145/3701716.3717821 (2025).

  42. Long, S., Schuster, T. & Piché, A. Can large language models build causal graphs? arXiv preprint (2023).

  43. Samarajeewa, C., De Silva, D., Osipov, E., Alahakoon, D. & Manic, M. Causal reasoning in large language models using causal graph retrieval augmented generation. In 2024 16th International Conference on Human System Interaction (HSI), 1–6, https://doi.org/10.1109/HSI61632.2024.10613566 (2024).

  44. Jiralerspong, T., Chen, X., More, Y., Shah, V. & Bengio, Y. Efficient causal graph discovery using large language models. arXiv preprint (2024).

  45. Krippendorff, K. Content analysis: An introduction to its methodology (1980).

  46. Cohen, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 37–46 (1960).

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. European Commission, Joint Research Centre (JRC), Ispra, Italy

    Michele Ronco, Lorenzo Bertolini, Sergio Consoli, Alessio Spadaro, Marco Verile & Christina Corbane

  2. Engineering Ingegneria Informatica, Roma, Italy

    Luca Bandelli

  3. Institute of Health and Society (IRSS), University of Louvain (UCLouvain), Brussels, Belgium

    Damien Delforge

Authors
  1. Michele Ronco
    View author publications

    Search author on:PubMed Google Scholar

  2. Luca Bandelli
    View author publications

    Search author on:PubMed Google Scholar

  3. Lorenzo Bertolini
    View author publications

    Search author on:PubMed Google Scholar

  4. Sergio Consoli
    View author publications

    Search author on:PubMed Google Scholar

  5. Damien Delforge
    View author publications

    Search author on:PubMed Google Scholar

  6. Alessio Spadaro
    View author publications

    Search author on:PubMed Google Scholar

  7. Marco Verile
    View author publications

    Search author on:PubMed Google Scholar

  8. Christina Corbane
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Michele Ronco: Conceptualization, data curation, formal analysis, software development, writing - original draft preparation. Luca Bandelli: Conceptualization, data curation, software development, writing. Lorenzo Bertolini: Conceptualization, data curation, supervision - review and editing. Sergio Consoli: Conceptualization, data curation, software development, writing. Damien Delforge: Writing - review. Alessio Spadaro: Conceptualization, review. Marco Verile: Conceptualization, supervision. Christina Corbane: Conceptualization, supervision, writing - review and editing.

Corresponding author

Correspondence to Michele Ronco.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ronco, M., Bandelli, L., Bertolini, L. et al. Disaster Storylines and Knowledge Graphs from Global News with Large Language Models and Retrieval-Augmented Generation. Sci Data (2026). https://doi.org/10.1038/s41597-026-07036-2

Download citation

  • Received: 08 September 2025

  • Accepted: 05 March 2026

  • Published: 17 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-07036-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Datasets for language sciences

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing