Disaster Storylines and Knowledge Graphs from Global News with Large Language Models and Retrieval-Augmented Generation

Ronco, Michele; Bandelli, Luca; Bertolini, Lorenzo; Consoli, Sergio; Delforge, Damien; Spadaro, Alessio; Verile, Marco; Corbane, Christina

doi:10.1038/s41597-026-07036-2

Download PDF

Data Descriptor
Open access
Published: 17 March 2026

Disaster Storylines and Knowledge Graphs from Global News with Large Language Models and Retrieval-Augmented Generation

Scientific Data , Article number: (2026) Cite this article

2246 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

We present a dataset of over 3,000 global disaster events from 2014 to 2024, derived from the Emergency Events Database (EM-DAT). Events are extracted from news using a pipeline combining Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) for semantic extraction. The corpus is the Europe Media Monitor (EMM), aggregating content from millions of news outlets. For each event, structured storylines are automatically generated, summarizing hazard characteristics, drivers, impacts, and responses, and transformed into knowledge graphs. This enables analysis of relationships, inter-hazard dynamics, and human-environment interactions often missed in traditional records. A small subset of knowledge graphs was evaluated by domain experts in a workshop, while a larger sample of extracted triplets was independently assessed to quantify precision and inter-annotator agreement. The dataset supports retrospective analysis and multi-hazard risk assessment, complementing resources like the Hazard Information Profiles (HIPs). All data, code, and workflows are openly available, with an interactive dashboard for exploration. This resource advances data-driven approaches to disaster scenario modeling, impact analysis, and decision support in disaster risk management.

Human and economic impacts of natural disasters: can we trust the global data?

Article Open access 16 September 2022

Knowledge graph–large language model fusion approach for emergency knowledge recommendation in gas tunnels

Article Open access 28 February 2026

Addressing the data imbalance issue in machine learning modeling of rare and disruptive outage events

Article Open access 05 March 2026

Data availability

The dataset containing both storylines and KGs is available in CSV format within the Joint Research Centre Data Catalogue at https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/ETOHA/storylines/DisasterStory.csv²¹. To maximize access and visibility, we also make all data and code available on Zenodo in a single repository, which can be downloaded at https://doi.org/10.5281/zenodo.18598183.

Code availability

All Python code used for data processing, RAG pipeline, and knowledge graph extraction is available at https://github.com/jrcf7/crisesStorylinesRAG. The code for the Gradio application can be found at https://huggingface.co/spaces/roncmic/crisesStorylinesRAG. All data and code are also made available on Zenodo in a single repository, which can be downloaded at https://doi.org/10.5281/zenodo.18598183.

References

Jacot des Combes, H. et al. Hazard definition and classification review: Technical report (2025). United Nations Office for Disaster Risk Reduction https://doi.org/10.24948/2025.05 (2025).
De Angeli, S. et al. A multi-hazard framework for spatial-temporal impact analysis. International Journal of Disaster Risk Reduction 73, 102829, https://doi.org/10.1016/j.ijdrr.2022.102829 (2022).
Google Scholar
Gill, J. C. & Malamud, B. D. Reviewing and visualizing the interactions of natural hazards. Reviews of Geophysics 52, 680–722, https://doi.org/10.1002/2013RG000445 (2014).
Google Scholar
Šakić Trogrlić, R. et al. Challenges in assessing and managing multi-hazard risks: A european stakeholders perspective. Environmental Science & Policy 157, 103774, https://doi.org/10.1016/j.envsci.2024.103774 (2024).
Google Scholar
Thomas, D. S. K., Jang, S. & Scandlyn, J. The chasms conceptual model of cascading disasters and social vulnerability: The covid-19 case example. International Journal of Disaster Risk Reduction 51, 101828, https://doi.org/10.1016/j.ijdrr.2020.101828 (2020).
Google Scholar
Tilloy, A., Malamud, B., Winter, H. & Joly-Laugel, A. A review of quantification methodologies for multi-hazard interrelationships. Earth-Science Reviews 196, 102881, https://doi.org/10.1016/j.earscirev.2019.102881 (2019).
Google Scholar
Gallina, V. et al. A review of multi-risk methodologies for natural hazards: Consequences and challenges for a climate change impact assessment. Journal of Environmental Management 168, 123–132, https://doi.org/10.1016/j.jenvman.2015.11.011 (2016).
Google Scholar
Rokhideh, M., Fearnley, C. & Budimir, M. Multi-hazard early warning systems in the sendai framework for disaster risk reduction: Achievements, gaps, and future directions. International Journal of Disaster Risk Science 16, 103–116, https://doi.org/10.1007/s13753-025-00622-9 (2025).
Google Scholar
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, 5999–6009 (2017).
Brown, T. B. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems, vol. 2020-December (2020).
Lei, Z. et al. Harnessing large language models for disaster management: A survey. Findings of the Association for Computational Linguistics: ACL https://doi.org/10.18653/v1/2025.findings-acl.750 (2025).
Xu, F., Ma, J., Li, N. & Cheng, J. C. P. Large language model applications in disaster management: An interdisciplinary review. International Journal of Disaster Risk Reduction 127, 105642, https://doi.org/10.1016/j.ijdrr.2025.105642 (2025).
Google Scholar
Jeba, S. M., Aurpa, T. T. & Adib, M. R. S. From facebook posts to news headlines: Using transformer models to predict post-disaster impact on mass media content. Social Network Analysis and Mining 14, 200 (2024).
Google Scholar
Hou, J. & Xu, S. Near-real-time seismic human fatality information retrieval from social media with few-shot large-language models. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, 1141–1147 (2022).
Balashankar, A. et al. Predicting food crises using news streams. Science Advances 9, eabm3449, https://doi.org/10.1126/sciadv.abm3449 (2023).
Google Scholar
Delforge, D. et al. EM-DAT: The Emergency Events Database. International Journal of Disaster Risk Reduction 124, 105509, https://doi.org/10.1016/j.ijdrr.2025.105509 (2025).
Google Scholar
Sodoge, J., Kuhlicke, C., Mahecha, M. D. & de Brito, M. M. Text mining uncovers the unique dynamics of socio-economic impacts of the 2018-2022 multi-year drought in germany. Natural Hazards and Earth System Sciences 24, 1757–1777, https://doi.org/10.5194/nhess-24-1757-2024 (2024).
Google Scholar
Alencar, P. H. L., Sodoge, J., Paton, E. N. & de Brito, M. M. Flash droughts and their impacts-using newspaper articles to assess the perceived consequences of rapidly emerging droughts. Environmental Research Letters 19, 074048, https://doi.org/10.1088/1748-9326/ad58fa (2024).
Google Scholar
Firmansyah, H. B. et al. Enhancing disaster response with automated text information extraction from social media images. In 2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService), 71–78, https://doi.org/10.1109/BigDataService58306.2023.00017 (2023).
Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33, 9459–9474 (2020).
Google Scholar
Ronco, M. et al. crisesStorylinesRAG [Data set]. Zenodo, https://doi.org/10.5281/zenodo.18598183 (2026).
Steinberger, R. et al. EMM: Supporting the analyst by turning multilingual text into structured data. In Transparenz aus Verantwortung: neue Herausforderungen für die digitale Datenanalyse (Erich Schmidt Verlag, 2017).
Ji, S., Pan, S., Cambria, E., Marttinen, P. & Yu, P. S. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications. IEEE Transactions on Neural Networks and Learning Systems 33, https://doi.org/10.1109/TNNLS.2021.3070843 (2022).
Auer, S. et al. Towards a Knowledge Graph for Science. In Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, WIMS ’18 (Association for Computing Machinery, New York, NY, USA, 2018).
Hogan, A. et al. Knowledge graphs. ACM Computing Surveys 54, https://doi.org/10.1145/3447772 (2021).
Tiwari, S., Ortíz-Rodriguez, F., Abbés, S. B., Usip, P. U. & Hantach, R.Semantic AI in Knowledge Graphs (Taylor & Francis, Boca Raton, US, 2023).
Heath, T. & Bizer, C. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology 1, 1–121 (2011).
Google Scholar
Steinberger, R., Pouliquen, B. & van der Goot, E. An introduction to the Europe Media Monitor family of applications (2013).
Touvron, H. et al. LLaMA: Open and Efficient Foundation Language Models (2023).
Dubey, A. et al. The Llama 3 Herd of Models (2024).
Yang, J., Han, S. C. & Poon, J. A survey on extraction of causal relations from natural language text. Knowl. Inf. Syst. 64, 1161–1186 (2022).
Google Scholar
Yerkhassym, A., Pak, A. A., Akhmetov, I., Yelenov, A. & Gelbukh, A. On causality problem in natural language processing field. Computacion y Sistemas 26, 1549 - 1556 (2022).
Google Scholar
Coletta, V. R. et al. Causal loop diagrams for supporting nature based solutions participatory design and performance assessment. Journal of Environmental Management 280, 111668 (2021).
Google Scholar
Inam, A., Adamowski, J., Halbe, J. & Prasher, S. Using causal loop diagrams for the initialization of stakeholder engagement in soil salinity management in agricultural watersheds in developing countries: A case study in the rechna doab watershed, pakistan. Journal of Environmental Management 152, 251–267 (2015).
Google Scholar
Dong, Q. et al. A survey on in-context learning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing https://doi.org/10.18653/v1/2024.emnlp-main.64 (2024).
Peng, C., Xia, F. & Naseriparsa, M. et al. Knowledge graphs: Opportunities and challenges. Artificial Intelligence Review 56, 13071–13102 (2023).
Google Scholar
Consoli, S., Coletti, P. & Markov, P. V. et al. An epidemiological knowledge graph extracted from the world health organization’s disease outbreak news. Scientific Data 12, 970 (2025).
Google Scholar
Bertolini, L., Hulsman, R., Consoli, S., Puertas Gallardo, A. & Ceresa, M. On constructing biomedical text-to-graph systems with large language models. In CEUR Workshop Proceedings, vol. 3747 (2024).
Yang, R., Zhu, J., Man, J., Fang, L. & Zhou, Y. Enhancing text-based knowledge graph completion with zero-shot large language models: A focus on semantic enhancement. Knowl. Based Syst. 300, 112155 (2023).
Google Scholar
Antonucci, A., Piqué, G. & Zaffalon, M. Zero-shot causal graph extrapolation from text via llms. arXiv preprint (2023).
Yang, R. et al. Graphusion: A RAG Framework for Scientific Knowledge Graph Construction with a Global Perspective. WWW ‘25: The ACM Web Conference. https://dl.acm.org/doi/10.1145/3701716.3717821 (2025).
Long, S., Schuster, T. & Piché, A. Can large language models build causal graphs? arXiv preprint (2023).
Samarajeewa, C., De Silva, D., Osipov, E., Alahakoon, D. & Manic, M. Causal reasoning in large language models using causal graph retrieval augmented generation. In 2024 16th International Conference on Human System Interaction (HSI), 1–6, https://doi.org/10.1109/HSI61632.2024.10613566 (2024).
Jiralerspong, T., Chen, X., More, Y., Shah, V. & Bengio, Y. Efficient causal graph discovery using large language models. arXiv preprint (2024).
Krippendorff, K. Content analysis: An introduction to its methodology (1980).
Cohen, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 37–46 (1960).
Google Scholar

Download references

Author information

Authors and Affiliations

European Commission, Joint Research Centre (JRC), Ispra, Italy
Michele Ronco, Lorenzo Bertolini, Sergio Consoli, Alessio Spadaro, Marco Verile & Christina Corbane
Engineering Ingegneria Informatica, Roma, Italy
Luca Bandelli
Institute of Health and Society (IRSS), University of Louvain (UCLouvain), Brussels, Belgium
Damien Delforge

Authors

Michele Ronco
View author publications
Search author on:PubMed Google Scholar
Luca Bandelli
View author publications
Search author on:PubMed Google Scholar
Lorenzo Bertolini
View author publications
Search author on:PubMed Google Scholar
Sergio Consoli
View author publications
Search author on:PubMed Google Scholar
Damien Delforge
View author publications
Search author on:PubMed Google Scholar
Alessio Spadaro
View author publications
Search author on:PubMed Google Scholar
Marco Verile
View author publications
Search author on:PubMed Google Scholar
Christina Corbane
View author publications
Search author on:PubMed Google Scholar

Contributions

Michele Ronco: Conceptualization, data curation, formal analysis, software development, writing - original draft preparation. Luca Bandelli: Conceptualization, data curation, software development, writing. Lorenzo Bertolini: Conceptualization, data curation, supervision - review and editing. Sergio Consoli: Conceptualization, data curation, software development, writing. Damien Delforge: Writing - review. Alessio Spadaro: Conceptualization, review. Marco Verile: Conceptualization, supervision. Christina Corbane: Conceptualization, supervision, writing - review and editing.

Corresponding author

Correspondence to Michele Ronco.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ronco, M., Bandelli, L., Bertolini, L. et al. Disaster Storylines and Knowledge Graphs from Global News with Large Language Models and Retrieval-Augmented Generation. Sci Data (2026). https://doi.org/10.1038/s41597-026-07036-2

Download citation

Received: 08 September 2025
Accepted: 05 March 2026
Published: 17 March 2026
DOI: https://doi.org/10.1038/s41597-026-07036-2