Crisis as catalyst: evaluating ethical consistency and cooperation in LLMs under high-stakes scenarios

YANG, Aoxiang; ZUO, Mingyang; PENG, Rui; GAO, Chen; PENG, Zongchao

doi:10.1057/s41599-026-07194-z

Download PDF

Article
Open access
Published: 15 April 2026

Crisis as catalyst: evaluating ethical consistency and cooperation in LLMs under high-stakes scenarios

Aoxiang YANG^1,2,
Mingyang ZUO³,
Rui PENG^2,4,
Chen GAO⁵ &
…
Zongchao PENG^1,2

Humanities and Social Sciences Communications , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

As large language models (LLMs) integrate into critical decision-making, their alignment with human values in high-stakes scenarios remains unclear. This study systematically investigates LLM behavioral consistency, focusing on cooperative intent, resource distribution, and moral reasoning, under simulated emergencies. We employed established psychological scales in two crisis scenarios: natural disaster resource allocation and crowd panic response. We use “catalyst” metaphorically: crisis framings serve as an observational stress test that amplifies and reveals latent behavioral trade-offs in LLMs rather than improving the models. Using a standardized API framework, we evaluated three primary LLMs (gpt-4o, DeepSeek-V3, and DeepSeek-R1) across repeated trials, analyzing both quantitative decisions and qualitative justifications. Results reveal that while LLMs reproduce broad human-like preferences (e.g., cooperation over competition), they exhibit systematic variations in ethical trade-offs and “flattened” decision distributions. Models differed significantly in cooperative framing and showed attenuated sensitivity to social variables (e.g., future interaction expectations) compared to humans. These findings advance computational crisis management and AI ethics, demonstrating context-dependent value misalignment risks. We propose a novel framework for evaluating behavioral consistency in silicon-based agents during crises, offering critical methodological and ethical guidance for deploying LLMs in socially complex, high-stakes environments.

Dual-process theory and decision-making in large language models

Article 14 November 2025

LLM ethics benchmark: a three-dimensional assessment system for evaluating moral reasoning in large language models

Article Open access 05 October 2025

Revealing the intrinsic ethical vulnerability of aligned large language models

Article Open access 21 March 2026

Data availability

The datasets generated and analyzed during the current study are openly available in the Figshare repository at: https://doi.org/10.6084/m9.figshare.29722715. The code supporting this project is openly available on GitHub at: https://github.com/keeno-morning-haze/Crisis_as_Catalyst.

References

Aher M, Ghosh S, Goldfarb-Tarrant S, Narasimhan K (2023) Using language models to simulate human participants in experiments. Nat Hum Behav 7:221–231. https://doi.org/10.1038/s41562-022-01595-0
Google Scholar
Aher G, Arriaga RI, Kalai AT (2023) Using large language models to simulate multiple humans and replicate human subject studies. In: Proceedings of the 40th international conference on machine learning, p 337–371
Ansell C, Gash A (2008) Collaborative governance in theory and practice. J Public Adm Res Theory 18:543–571. https://doi.org/10.1093/jopart/mum032
Google Scholar
Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of One, Many: Using Language Models to Simulate Human Samples. Polit Anal 31(3):337–351. https://doi.org/10.1017/pan.2023.2
Google Scholar
Bai Y, Kadavath S, Kundu S, Askell A, Kernion J, Henighan T, … Krueger D (2022) Constitutional AI: harmlessness from AI feedback. Preprint at https://arxiv.org/abs/2212.08073
Boin A, t Hart P (2010) Organising for effective emergency management: lessons from research. Aust J Public Adm 69:357–371
Google Scholar
Boin A, Bynander F (2015) Explaining success and failure in crisis coordination. Geogr Ann 97:123–135. https://doi.org/10.1111/geoa.12072
Google Scholar
Boin A, ‘t Hart P, Stern E, Sundelius B (2017) The politics of crisis management: Public leadership under pressure. Cambridge University Press, Cambridge
Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buchsbaum D, Card D, Castellon R, Chatterji N, Chen A, Creel K, Davis JQ, Demszky D, ... Liang P (2021) On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Amodei D (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Google Scholar
Bynander F, Nohrstedt D (2020) Collaborative crisis management: Inter-organizational approaches to extreme events. Routledge, New York
Comfort LK, Boin A, Demchak CC (2010) Designing resilience: preparing for extreme events. University of Pittsburgh Press
Coningham R, Lewer N, Acharya KP, Weise K, Kunwar RB, Joshi A, Parajuli Khanal S (2024) Enabling equitable and ethical research partnerships in crisis situations: lessons learned from post-disaster heritage protection interventions following Nepal’s 2015 earthquake. Res Ethics 20:835–846
Google Scholar
Drury J, Cocking C, Reicher S (2009) Everyone for themselves? A comparative study of crowd solidarity among emergency survivors. Br J Soc Psychol 48:487–506
Google Scholar
Emerson K, Nabatchi T (2015) Collaborative governance regimes. Georgetown University Press
Gelfand MJ, Raver JL, Nishii L, Leslie LM, Lun J, Lim BC, Aycan Z (2011) Differences between tight and loose cultures: a 33-nation study. Science 332:1100–1104. https://doi.org/10.1126/science.1197754
Google Scholar
Gopinadh MPVS, Lakshmi Sindhu K, Sekhar Pandu Ranga Raju S, Swarna Yesaswini. (2026) Regional Bias in Large Language Models. arXiv preprint arXiv: 2601.16349
Hofstede, G. (2001) Culture’s consequences: comparing values, behaviors, institutions and organizations across nations, 2nd edn. Sage
House RJ, Hanges PJ, Javidan M, Dorfman PW, Gupta V (eds) (2004) Culture, leadership, and organizations: The GLOBE study of 62 societies. Sage
Kapucu N (2009) Collaborative emergency management: Better community organizing, better public preparedness and response. Disasters 33:239–262. https://doi.org/10.1111/j.1467-7717.2008.00537.x
Google Scholar
Kapucu N, Garayev V (2011) Collaborative decision-making in emergency and disaster management. Int J Public Adm 34:366–375. https://doi.org/10.1080/01900692.2011.561477
Google Scholar
Kim HS, Markus HR (1999) Deviance or uniqueness, harmony or conformity? A cultural analysis. J Personal Soc Psychol 77:785–800
Google Scholar
Kosinski M (2023) Theory of mind may have spontaneously emerged in large language models. Proc Natl Acad Sci USA 120: e2218523120. https://doi.org/10.1073/pnas.2218523120
Google Scholar
Le Pennec M, Raufflet E (2018) Value creation in inter-organizational collaboration: an empirical study. J Bus Ethics 148:817–34
Google Scholar
Markus HR, Kitayama S (1991) Culture and the self: implications for cognition, emotion, and motivation. Psychol Rev 98:224–253
Google Scholar
Nahapiet J (2009) The role of social capital in inter-organizational relationships. In: The Oxford handbook of inter-organizational relations. Oxford, Oxford University Press, p 580–606
Nguyen TN, Jamale K, Gonzalez C (2024) Predicting and understanding human action decisions: Insights from large language models and cognitive instance-based learning. In: Proceedings of the AAAI Conference on human computation and crowdsourcing, Vol. 12, p 126–136
Nisbett RE (2003) The geography of thought: how Asians and Westerners think differently and why. Free Press
Nohrstedt D, Bynander F, Parker C, ‘t Hart P (2018) Managing crises collaboratively: Prospects and problems—a systematic literature review. Perspect Public Manag Gov 4:257–271
Google Scholar
Park JS, O’Brien J, Cai CJ, Morris MR, Hancock JT (2023) Generative agents: interactive simulacra of human behavior. In: Proceedings of the 2023 CHI conference on human factors in computing systems, 1–15. https://doi.org/10.1145/3544548.3581540
Pramanik A, Jin J, Wu W (2015) Organizational adaptation in multi-stakeholder crisis response: an experimental study. J Contingencies Crisis Manag 23:234–245
Google Scholar
Scherrer N, Shi C, Feder A, Blei DM (2023) Evaluating the moral beliefs encoded in LLMs. arXiv preprint arXiv:2307.14324
Shinn N, Cassano F, Berman E, Gopinath A, Narasimhan K, Yao S (2023) Reflexion: Language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366
Shiwakoti N, Sarvi M, Rose G, Burd M (2017) Likely behaviours of passengers under emergency evacuation in train station. Saf Sci 91:40–48. https://doi.org/10.1016/j.ssci.2016.07.012
Google Scholar
Song P, Han P, Goodman N (2026) Large language model reasoning failures. arXiv preprint arXiv:2602.06176
Triandis HC (1995) Individualism and collectivism. Westview Press
Turner RH, Killian LM (1987) Collective behavior, 3rd edn. Prentice-Hall
Wang A, Morgenstern J, Dickerson JP (2025) Large language models that replace human participants can harmfully misportray and flatten identity groups. Nat Mach Intell 7:400–411. https://doi.org/10.1038/s42256-025-00986-z
Google Scholar
Weidinger L, Mellor J, Gabriel I (2022) Ethical and social risks of harm from language models. Preprint at https://arxiv.org/abs/2112.04359

Download references

Author information

Authors and Affiliations

School of Public Policy & Management, Tsinghua University, Beijing, China
Aoxiang YANG & Zongchao PENG
Center for Crisis Management Research, Tsinghua University, Beijing, China
Aoxiang YANG, Rui PENG & Zongchao PENG
Weiyang College, Tsinghua University, Beijing, China
Mingyang ZUO
School of Government and Public Affairs, Communication University of China, Beijing, China
Rui PENG
BNRist, Tsinghua University, Beijing, China
Chen GAO

Authors

Aoxiang YANG
View author publications
Search author on:PubMed Google Scholar
Mingyang ZUO
View author publications
Search author on:PubMed Google Scholar
Rui PENG
View author publications
Search author on:PubMed Google Scholar
Chen GAO
View author publications
Search author on:PubMed Google Scholar
Zongchao PENG
View author publications
Search author on:PubMed Google Scholar

Contributions

Yang-Ao Xiang and Zuo-Ming Yang: conducted data analysis and drafted the original manuscript; Rui Peng, Chen Gao, and Zong-Chao Peng: reviewed, edited, and contributed to project oversight and administrative tasks. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Rui PENG, Chen GAO or Zongchao PENG.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

Ethical approval was not required for this study as it did not involve human participants, animal subjects, or sensitive personal data. The research consisted solely of computational simulations using publicly available large language models.

Informed consent

Not applicable, as no human participants were recruited or involved in this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Appendix (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

YANG, A., ZUO, M., PENG, R. et al. Crisis as catalyst: evaluating ethical consistency and cooperation in LLMs under high-stakes scenarios. Humanit Soc Sci Commun (2026). https://doi.org/10.1057/s41599-026-07194-z

Download citation

Received: 29 June 2025
Accepted: 27 March 2026
Published: 15 April 2026
DOI: https://doi.org/10.1057/s41599-026-07194-z