Abstract
As large language models (LLMs) integrate into critical decision-making, their alignment with human values in high-stakes scenarios remains unclear. This study systematically investigates LLM behavioral consistency, focusing on cooperative intent, resource distribution, and moral reasoning, under simulated emergencies. We employed established psychological scales in two crisis scenarios: natural disaster resource allocation and crowd panic response. We use “catalyst” metaphorically: crisis framings serve as an observational stress test that amplifies and reveals latent behavioral trade-offs in LLMs rather than improving the models. Using a standardized API framework, we evaluated three primary LLMs (gpt-4o, DeepSeek-V3, and DeepSeek-R1) across repeated trials, analyzing both quantitative decisions and qualitative justifications. Results reveal that while LLMs reproduce broad human-like preferences (e.g., cooperation over competition), they exhibit systematic variations in ethical trade-offs and “flattened” decision distributions. Models differed significantly in cooperative framing and showed attenuated sensitivity to social variables (e.g., future interaction expectations) compared to humans. These findings advance computational crisis management and AI ethics, demonstrating context-dependent value misalignment risks. We propose a novel framework for evaluating behavioral consistency in silicon-based agents during crises, offering critical methodological and ethical guidance for deploying LLMs in socially complex, high-stakes environments.
Similar content being viewed by others
Data availability
The datasets generated and analyzed during the current study are openly available in the Figshare repository at: https://doi.org/10.6084/m9.figshare.29722715. The code supporting this project is openly available on GitHub at: https://github.com/keeno-morning-haze/Crisis_as_Catalyst.
References
Aher M, Ghosh S, Goldfarb-Tarrant S, Narasimhan K (2023) Using language models to simulate human participants in experiments. Nat Hum Behav 7:221–231. https://doi.org/10.1038/s41562-022-01595-0
Aher G, Arriaga RI, Kalai AT (2023) Using large language models to simulate multiple humans and replicate human subject studies. In: Proceedings of the 40th international conference on machine learning, p 337–371
Ansell C, Gash A (2008) Collaborative governance in theory and practice. J Public Adm Res Theory 18:543–571. https://doi.org/10.1093/jopart/mum032
Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of One, Many: Using Language Models to Simulate Human Samples. Polit Anal 31(3):337–351. https://doi.org/10.1017/pan.2023.2
Bai Y, Kadavath S, Kundu S, Askell A, Kernion J, Henighan T, … Krueger D (2022) Constitutional AI: harmlessness from AI feedback. Preprint at https://arxiv.org/abs/2212.08073
Boin A, t Hart P (2010) Organising for effective emergency management: lessons from research. Aust J Public Adm 69:357–371
Boin A, Bynander F (2015) Explaining success and failure in crisis coordination. Geogr Ann 97:123–135. https://doi.org/10.1111/geoa.12072
Boin A, ‘t Hart P, Stern E, Sundelius B (2017) The politics of crisis management: Public leadership under pressure. Cambridge University Press, Cambridge
Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buchsbaum D, Card D, Castellon R, Chatterji N, Chen A, Creel K, Davis JQ, Demszky D, ... Liang P (2021) On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Amodei D (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Bynander F, Nohrstedt D (2020) Collaborative crisis management: Inter-organizational approaches to extreme events. Routledge, New York
Comfort LK, Boin A, Demchak CC (2010) Designing resilience: preparing for extreme events. University of Pittsburgh Press
Coningham R, Lewer N, Acharya KP, Weise K, Kunwar RB, Joshi A, Parajuli Khanal S (2024) Enabling equitable and ethical research partnerships in crisis situations: lessons learned from post-disaster heritage protection interventions following Nepal’s 2015 earthquake. Res Ethics 20:835–846
Drury J, Cocking C, Reicher S (2009) Everyone for themselves? A comparative study of crowd solidarity among emergency survivors. Br J Soc Psychol 48:487–506
Emerson K, Nabatchi T (2015) Collaborative governance regimes. Georgetown University Press
Gelfand MJ, Raver JL, Nishii L, Leslie LM, Lun J, Lim BC, Aycan Z (2011) Differences between tight and loose cultures: a 33-nation study. Science 332:1100–1104. https://doi.org/10.1126/science.1197754
Gopinadh MPVS, Lakshmi Sindhu K, Sekhar Pandu Ranga Raju S, Swarna Yesaswini. (2026) Regional Bias in Large Language Models. arXiv preprint arXiv: 2601.16349
Hofstede, G. (2001) Culture’s consequences: comparing values, behaviors, institutions and organizations across nations, 2nd edn. Sage
House RJ, Hanges PJ, Javidan M, Dorfman PW, Gupta V (eds) (2004) Culture, leadership, and organizations: The GLOBE study of 62 societies. Sage
Kapucu N (2009) Collaborative emergency management: Better community organizing, better public preparedness and response. Disasters 33:239–262. https://doi.org/10.1111/j.1467-7717.2008.00537.x
Kapucu N, Garayev V (2011) Collaborative decision-making in emergency and disaster management. Int J Public Adm 34:366–375. https://doi.org/10.1080/01900692.2011.561477
Kim HS, Markus HR (1999) Deviance or uniqueness, harmony or conformity? A cultural analysis. J Personal Soc Psychol 77:785–800
Kosinski M (2023) Theory of mind may have spontaneously emerged in large language models. Proc Natl Acad Sci USA 120: e2218523120. https://doi.org/10.1073/pnas.2218523120
Le Pennec M, Raufflet E (2018) Value creation in inter-organizational collaboration: an empirical study. J Bus Ethics 148:817–34
Markus HR, Kitayama S (1991) Culture and the self: implications for cognition, emotion, and motivation. Psychol Rev 98:224–253
Nahapiet J (2009) The role of social capital in inter-organizational relationships. In: The Oxford handbook of inter-organizational relations. Oxford, Oxford University Press, p 580–606
Nguyen TN, Jamale K, Gonzalez C (2024) Predicting and understanding human action decisions: Insights from large language models and cognitive instance-based learning. In: Proceedings of the AAAI Conference on human computation and crowdsourcing, Vol. 12, p 126–136
Nisbett RE (2003) The geography of thought: how Asians and Westerners think differently and why. Free Press
Nohrstedt D, Bynander F, Parker C, ‘t Hart P (2018) Managing crises collaboratively: Prospects and problems—a systematic literature review. Perspect Public Manag Gov 4:257–271
Park JS, O’Brien J, Cai CJ, Morris MR, Hancock JT (2023) Generative agents: interactive simulacra of human behavior. In: Proceedings of the 2023 CHI conference on human factors in computing systems, 1–15. https://doi.org/10.1145/3544548.3581540
Pramanik A, Jin J, Wu W (2015) Organizational adaptation in multi-stakeholder crisis response: an experimental study. J Contingencies Crisis Manag 23:234–245
Scherrer N, Shi C, Feder A, Blei DM (2023) Evaluating the moral beliefs encoded in LLMs. arXiv preprint arXiv:2307.14324
Shinn N, Cassano F, Berman E, Gopinath A, Narasimhan K, Yao S (2023) Reflexion: Language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366
Shiwakoti N, Sarvi M, Rose G, Burd M (2017) Likely behaviours of passengers under emergency evacuation in train station. Saf Sci 91:40–48. https://doi.org/10.1016/j.ssci.2016.07.012
Song P, Han P, Goodman N (2026) Large language model reasoning failures. arXiv preprint arXiv:2602.06176
Triandis HC (1995) Individualism and collectivism. Westview Press
Turner RH, Killian LM (1987) Collective behavior, 3rd edn. Prentice-Hall
Wang A, Morgenstern J, Dickerson JP (2025) Large language models that replace human participants can harmfully misportray and flatten identity groups. Nat Mach Intell 7:400–411. https://doi.org/10.1038/s42256-025-00986-z
Weidinger L, Mellor J, Gabriel I (2022) Ethical and social risks of harm from language models. Preprint at https://arxiv.org/abs/2112.04359
Author information
Authors and Affiliations
Contributions
Yang-Ao Xiang and Zuo-Ming Yang: conducted data analysis and drafted the original manuscript; Rui Peng, Chen Gao, and Zong-Chao Peng: reviewed, edited, and contributed to project oversight and administrative tasks. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
Ethical approval was not required for this study as it did not involve human participants, animal subjects, or sensitive personal data. The research consisted solely of computational simulations using publicly available large language models.
Informed consent
Not applicable, as no human participants were recruited or involved in this study.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
YANG, A., ZUO, M., PENG, R. et al. Crisis as catalyst: evaluating ethical consistency and cooperation in LLMs under high-stakes scenarios. Humanit Soc Sci Commun (2026). https://doi.org/10.1057/s41599-026-07194-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1057/s41599-026-07194-z


