Abstract
Human–machine partnerships are increasingly used to address grand societal challenges, yet knowledge of the comparative strengths of humans and machines to innovate is nascent. Here we compare the ability of humans (N = 9,198) and large language models (LLMs, N = 215,542 observations) to generate novel ideas in an established creativity task. We present three key results. First, human creativity on average is slightly higher than that of LLMs. Second, creativity differences are pronounced at the extremes of the distribution, with humans exhibiting greater variability and higher levels of creativity in the right-hand tail of the distribution. Third, attempts to increase the creativity of LLMs through instructing LLMs to take on genius personas or different demographic roles lifted performance up to a threshold beyond which the output became opposite real-life patterns, whereas strategic prompt-engineering efforts yielded mixed to negative results. We discuss the implications of our findings for human–machine collaboration and problem solving.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
Data for all analyses in the main manuscript and Supplementary Information are publicly available in the Open Science Framework (https://osf.io/a9v2t).
Code availability
Code for all analyses in the main manuscript and Supplementary Information are publicly available in the Open Science Framework (https://osf.io/a9v2t).
References
Uzzi, B. & Spiro, J. Collaboration and creativity: the small world problem. Am. J. Sociol. 111, 447–504 (2005).
Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and scientific impact. Science 342, 468–472 (2013).
Pomiechowska, B., Bródy, G., Téglás, E. & Kovács, Á. M. Early-emerging combinatorial thought: human infants flexibly combine kind and quantity concepts. Proc. Natl Acad. Sci. USA 121, e2315149121 (2024).
Frank, M. R. et al. Toward understanding the impact of artificial intelligence on labor. Proc. Natl Acad. Sci. USA 116, 6531–6539 (2019).
Grossmann, I. et al. AI and the transformation of social science research. Science 380, 1108–1109 (2023).
Shirado, H. & Christakis, N. A. Locally noisy autonomous agents improve global human coordination in network experiments. Nature 545, 370–374 (2017).
Shin, M., Kim, J., van Opheusden, B. & Griffiths, T. L. Superhuman artificial intelligence can improve human decision-making by increasing novelty. Proc. Natl Acad. Sci. USA 120, e2214840120 (2023).
Rahman, H. A. The invisible cage: workers’ reactivity to opaque algorithmic evaluations. Adm. Sci. Q. 66, 945–988 (2021).
Guilford, J. P. The Nature of Human Intelligence (McGraw-Hill, 1967).
Stevenson, C., Smal, I., Baas, M., Grasman, R. & van der Maas, H. Putting GPT-3’s creativity to the (alternative uses) test. In Proc. 13th International Conference on Computational Creativity (ICCC’22) (eds Hedblom, M. M. et al.) 164–168 (Association for Computational Creativity, 2022).
Haase, J. & Hanel, P. H. P. Artificial muses: generative artificial intelligence chatbots have risen to human-level creativity. J. Creat. 33, 100066 (2023).
Chakrabarty, T., Laban, P., Agarwal, D., Muresan, S. & Wu, C.-S. Art or artifice? Large language models and the false promise of creativity. In Proc. 2024 CHI Conference on Human Factors in Computing Systems (Association for Computing Machinery, 2024); https://doi.org/10.1145/3613904.3642731
Tian, Y. et al. MacGyver: are large language models creative problem solvers? In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Duh, K. et al.) (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.naacl-long.297
Doshi, A. R. & Hauser, O. P. Generative AI enhances individual creativity but reduces the collective diversity of novel content. Sci. Adv. 10, eadn5290 (2024).
Guo, D. et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025).
Jia, N., Luo, X., Fang, Z. & Liao, C. When and how artificial intelligence augments employee creativity. Acad. Manag. J. 67, 5–32 (2024).
van den Broek, E., Sergeeva, A. & Huysman, M. When the machine meets the expert: an ethnography of developing AI for hiring. MIS Q. 45, 1557–1580 (2021).
Olson, J. A., Nahas, J., Chmoulevitch, D., Cropper, S. J. & Webb, M. E. Naming unrelated words predicts creativity. Proc. Natl Acad. Sci. USA 118, e2022340118 (2021).
Bao, L., Cao, J., Gangadharan, L., Huang, D. & Lin, C. Effects of lockdowns in shaping socioeconomic behaviors. Proc. Natl Acad. Sci. USA 121, e2405934121 (2024).
Beketayev, K. & Runco, M. A. Scoring divergent thinking tests by computer with a semantics-based algorithm. Eur. J. Psychol. 12, 210–220 (2016).
Brophy, D. R. Understanding, measuring, and enhancing individual creative problem-solving efforts. Creat. Res. J. 11, 123–150 (1998).
Amabile, T. M. The social psychology of creativity: a componential conceptualization. J. Pers. Soc. Psychol. 45, 357–376 (1983).
Long, H. & Pang, W. Rater effects in creativity assessment: a mixed methods investigation. Think. Skills Creat. 15, 13–25 (2015).
Dumas, D., Organisciak, P. & Doherty, M. Measuring divergent thinking originality with human raters and text-mining models: a psychometric comparison of methods. Psychol. Aesthet. Creat. Arts https://doi.org/10.1037/aca0000319. (2020).
Beaty, R. E., Johnson, D. R., Zeitlen, D. C. & Forthmann, B. Semantic distance and the alternate uses task: recommendations for reliable automated assessment of originality. Creat. Res. J. 34, 245–260 (2022).
Guilford, J. P. Creativity: yesterday, today and tomorrow. J. Creat. Behav. 1, 3–14 (1967).
Wallach, M. A. & Kogan, N. A new look at the creativity-intelligence distinction. J. Pers. 33, 348–369 (1965).
Yang, Y., Youyou, W. & Uzzi, B. Estimating the deep replicability of scientific findings using human and artificial intelligence. Proc. Natl Acad. Sci. USA 117, 10762–10768 (2020).
Farquhar, S., Kossen, J., Kuhn, L. & Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024).
Pividori, M. Chatbots in science: what can ChatGPT do for you? Nature https://doi.org/10.1038/d41586-024-02630-z (2024).
Samdarshi, P. et al. Connecting the dots: evaluating abstract reasoning capabilities of LLMs using the New York Times Connections word game. In Proc. 2024 Conference on Empirical Methods in Natural Language Processing (eds Al-Onaizan, Y. et al.) (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.emnlp-main.1182
Todd, G., Merino, T., Earle, S. & Togelius, J. Missed connections: lateral thinking puzzles for large language models. In Proc. 2024 IEEE Conference on Games (CoG) 1–8 (Institute of Electrical and Electronics Engineers, 2024).
Cvrček, V. et al. Comparing web-crawled and traditional corpora. Lang. Resour. Eval. 54, 713–745 (2020).
Horowitz, J. L. Bootstrap methods in econometrics. Annu. Rev. Econ. 11, 193–224 (2019).
Efron, B. Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979).
Jentzsch, S. & Kersting, K. ChatGPT is fun, but it is not funny! Humor is still challenging large language models. In Proc. 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis (eds Barnes, J. et al.) 325–340 (Association for Computational Linguistics, 2023).
Castillo, L., León-Villagrá, P., Chater, N. & Sanborn, A. Explaining the flaws in human random generation as local sampling with momentum. PLoS Comput. Biol. 20, e1011739 (2024).
Angelike, T. & Musch, J. A comparative evaluation of measures to assess randomness in human-generated sequences. Behav. Res. Methods 56, 7831–7848 (2024).
Geva, E. & Ryan, E. Linguistic and cognitive correlates of academic skills in first and second languages. Lang. Learn. 43, 5–42 (1993).
Henrickson, L. & Meroño-Peñuela, A. Prompting meaning: a hermeneutic approach to optimising prompt engineering with ChatGPT. AI Soc. https://doi.org/10.1007/s00146-023-01752-8 (2023).
Giray, L. Prompt engineering with ChatGPT: a guide for academic writers. Ann. Biomed. Eng. 51, 2629–2633 (2023).
Lin, Z. How to write effective prompts for large language models. Nat. Hum. Behav. 8, 611–615 (2024).
Aggarwal, A., Lohia, P., Nagar, S., Dey, K. & Saha, D. Black box fairness testing of machine learning models. In Proc. 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Association for Computing Machinery, 2019); https://doi.org/10.1145/3338906.3338937
Chao, P. et al. Jailbreaking black box large language models in twenty queries. In Proc. 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) 23–42 (Institute of Electrical and Electronics Engineers, 2025).
Lapid, R., Langberg, R., & Sipper, M. Open sesame! Universal black-box jailbreaking of large language models. Appl. Sci. 14, 7150 (2024).
Chesebrough, C., Chrysikou, E. G., Holyoak, K. J., Zhang, F. & Kounios, J. Conceptual change induced by analogical reasoning sparks aha moments. Creat. Res. J. 35, 499–521 (2023).
Beaty, R. E. & Kenett, Y. N. Associative thinking at the core of creativity. Trends Cogn. Sci. 27, 671–683 (2023).
Te’eni, D. et al. Reciprocal human-machine learning: a theory and an instantiation for the case of message classification. Manage. Sci. https://doi.org/10.1287/mnsc.2022.03518 (2023).
Yax, N., Anlló, H. & Palminteri, S. Studying and improving reasoning in humans and machines. Commun. Psychol. 2, 51 (2024).
Strachan, J. W. A. et al. Testing theory of mind in large language models and humans. Nat. Hum. Behav. 8, 1285–1295 (2024).
Bzdok, D. et al. Data science opportunities of large language models for neuroscience and biomedicine. Neuron 112, 698–717 (2024).
Padmakumar, V. & He, H. Does writing with language models reduce content diversity? In Proc. International Conference on Representation Learning (Kim, B. et al.) 642–669 (ICLR, 2024).
Anderson, B. R., Shah, J. H. & Kreminski, M. Homogenization effects of large language models on human creative ideation. In Proc. 16th Conference on Creativity and Cognition (Association for Computing Machinery, 2024); https://doi.org/10.1145/3635636.3656204
Mohammadi, B. Creativity has left the chat: the price of debiasing language models. Preprint at https://arxiv.org/abs/2406.05587 (2024).
Groh, M. et al. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat. Med. 30, 573–583 (2024).
Marks, M. A., DeChurch, L. A., Mathieu, J. E., Panzer, F. J. & Alonso, A. Teamwork in multiteam systems. J. Appl. Psychol. 90, 964–971 (2005).
Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).
Bellemare-Pepin, A. et al. Divergent creativity in humans and large language models. Preprint at https://arxiv.org/abs/2405.13012 (2024).
Chen, H. & Ding, N. Probing the ‘creativity’ of large language models: can models produce divergent semantic association? In Findings of the Association for Computational Linguistics: EMNLP 2023 (eds Bouamor, H. et al.) (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.findings-emnlp.858
Childs, P. et al. The creativity diamond—a framework to aid creativity. J. Intell. 10, 73 (2022).
Chen, L. et al. TRIZ-GPT: an LLM-augmented method for problem-solving. In Proc. 36th International Conference on Design Theory and Methodology (DTM) V006T06A010 (American Society of Mechanical Engineers, 2024).
Chen, L. et al. DesignFusion: integrating generative models for conceptual design enrichment. J. Mech. Des. 146, 111703 (2024).
Hennessey, B. A., Amabile, T. M. & Mueller, J. S. in Encyclopedia of Creativity (Elsevier, 2011); https://doi.org/10.1016/B978-0-12-375038-9.00046-7
Cropley, A. In praise of convergent thinking. Creat. Res. J. 18, 391–404 (2006).
Wang, D. Presentation in self-posted facial images can expose sexual orientation: Implications for research and privacy. J. Pers. Soc. Psychol. 122, 806–824 (2022).
Taylor, J. E. T. & Taylor, G. W. Artificial cognition: how experimental psychology can help generate explainable artificial intelligence. Psychon. Bull. Rev. 28, 454–475 (2021).
Voudouris, K. et al. Direct human–AI comparison in the animal–AI environment. Front. Psychol. 13, 711821 (2022).
Hitsuwari, J., Ueda, Y., Yun, W. & Nomura, M. Does human–AI collaboration lead to more creative art? Aesthetic evaluation of human-made and AI-generated haiku poetry. Comput. Hum. Behav. 139, 107502 (2022).
Griffiths, T. L. Understanding human intelligence through human limitations. Trends Cogn. Sci. 24, 873–883 (2020).
Acknowledgements
We thank Z. Dai, K. Savani and Z. Shen for their contributions to this work. D.W. was supported by the Seed Fund for Basic Research from the University of Hong Kong (grant no. 2201101303). D.H. was supported by the National Natural Science Foundation of China (grant nos. 72503232, 72574227 and T2293771). H.S. was supported by the Theme-based Research Fund provided by HKU Education Consulting (Shenzhen) Co., Ltd (grant SZRI2023-TBRF-03), the Research Grants Council of the Hong Kong Special Administrative Region, China (grant CRF-C7162-20G), and Strategic allocation 2018/19 (2c): Capacity Building for Development of ‘Business Analytics and Big Data’. B.U. was supported by the National Science Foundation through the NSF National Synthesis Center for Emergence in the Molecular and Cellular Sciences (grant MCB-2335029), Northwestern University’s Kellogg School of Management, Northwestern Institute on Complex Systems, and the Ryan Institute on Complexity. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Contributions
D.W., D.H., H.S. and B.U. designed the research. D.W. performed the research. D.W. and H.S. analysed the data. D.W., D.H. and B.U. wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Human Behaviour thanks Tuhin Chakrabarty, Liuqing Chen and Ken Gilhooly for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–5, Figs. 1–17, Tables 1–38 and references.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, D., Huang, D., Shen, H. et al. A large-scale comparison of divergent creativity in humans and large language models. Nat Hum Behav (2025). https://doi.org/10.1038/s41562-025-02331-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41562-025-02331-1


