Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A large-scale comparison of divergent creativity in humans and large language models

Abstract

Human–machine partnerships are increasingly used to address grand societal challenges, yet knowledge of the comparative strengths of humans and machines to innovate is nascent. Here we compare the ability of humans (N = 9,198) and large language models (LLMs, N = 215,542 observations) to generate novel ideas in an established creativity task. We present three key results. First, human creativity on average is slightly higher than that of LLMs. Second, creativity differences are pronounced at the extremes of the distribution, with humans exhibiting greater variability and higher levels of creativity in the right-hand tail of the distribution. Third, attempts to increase the creativity of LLMs through instructing LLMs to take on genius personas or different demographic roles lifted performance up to a threshold beyond which the output became opposite real-life patterns, whereas strategic prompt-engineering efforts yielded mixed to negative results. We discuss the implications of our findings for human–machine collaboration and problem solving.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparison of the divergent creativity scores between humans and LLMs.
Fig. 2: Comparison of divergent creativity scores across different temperature values for LLMs.
Fig. 3: Comparison of divergent creativity scores across different perspective prompts for LLMs.
Fig. 4: Comparison of divergent creativity scores across different celebrity prompts for LLMs.
Fig. 5: Comparison of divergent creativity scores across different demographic prompts for LLMs.

Similar content being viewed by others

Data availability

Data for all analyses in the main manuscript and Supplementary Information are publicly available in the Open Science Framework (https://osf.io/a9v2t).

Code availability

Code for all analyses in the main manuscript and Supplementary Information are publicly available in the Open Science Framework (https://osf.io/a9v2t).

References

  1. Uzzi, B. & Spiro, J. Collaboration and creativity: the small world problem. Am. J. Sociol. 111, 447–504 (2005).

    Article  Google Scholar 

  2. Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and scientific impact. Science 342, 468–472 (2013).

    Article  CAS  PubMed  Google Scholar 

  3. Pomiechowska, B., Bródy, G., Téglás, E. & Kovács, Á. M. Early-emerging combinatorial thought: human infants flexibly combine kind and quantity concepts. Proc. Natl Acad. Sci. USA 121, e2315149121 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Frank, M. R. et al. Toward understanding the impact of artificial intelligence on labor. Proc. Natl Acad. Sci. USA 116, 6531–6539 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Grossmann, I. et al. AI and the transformation of social science research. Science 380, 1108–1109 (2023).

    Article  CAS  PubMed  Google Scholar 

  6. Shirado, H. & Christakis, N. A. Locally noisy autonomous agents improve global human coordination in network experiments. Nature 545, 370–374 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Shin, M., Kim, J., van Opheusden, B. & Griffiths, T. L. Superhuman artificial intelligence can improve human decision-making by increasing novelty. Proc. Natl Acad. Sci. USA 120, e2214840120 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Rahman, H. A. The invisible cage: workers’ reactivity to opaque algorithmic evaluations. Adm. Sci. Q. 66, 945–988 (2021).

    Article  Google Scholar 

  9. Guilford, J. P. The Nature of Human Intelligence (McGraw-Hill, 1967).

  10. Stevenson, C., Smal, I., Baas, M., Grasman, R. & van der Maas, H. Putting GPT-3’s creativity to the (alternative uses) test. In Proc. 13th International Conference on Computational Creativity (ICCC’22) (eds Hedblom, M. M. et al.) 164–168 (Association for Computational Creativity, 2022).

  11. Haase, J. & Hanel, P. H. P. Artificial muses: generative artificial intelligence chatbots have risen to human-level creativity. J. Creat. 33, 100066 (2023).

    Article  Google Scholar 

  12. Chakrabarty, T., Laban, P., Agarwal, D., Muresan, S. & Wu, C.-S. Art or artifice? Large language models and the false promise of creativity. In Proc. 2024 CHI Conference on Human Factors in Computing Systems (Association for Computing Machinery, 2024); https://doi.org/10.1145/3613904.3642731

  13. Tian, Y. et al. MacGyver: are large language models creative problem solvers? In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Duh, K. et al.) (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.naacl-long.297

  14. Doshi, A. R. & Hauser, O. P. Generative AI enhances individual creativity but reduces the collective diversity of novel content. Sci. Adv. 10, eadn5290 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Guo, D. et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Jia, N., Luo, X., Fang, Z. & Liao, C. When and how artificial intelligence augments employee creativity. Acad. Manag. J. 67, 5–32 (2024).

    Article  Google Scholar 

  17. van den Broek, E., Sergeeva, A. & Huysman, M. When the machine meets the expert: an ethnography of developing AI for hiring. MIS Q. 45, 1557–1580 (2021).

    Article  Google Scholar 

  18. Olson, J. A., Nahas, J., Chmoulevitch, D., Cropper, S. J. & Webb, M. E. Naming unrelated words predicts creativity. Proc. Natl Acad. Sci. USA 118, e2022340118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Bao, L., Cao, J., Gangadharan, L., Huang, D. & Lin, C. Effects of lockdowns in shaping socioeconomic behaviors. Proc. Natl Acad. Sci. USA 121, e2405934121 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Beketayev, K. & Runco, M. A. Scoring divergent thinking tests by computer with a semantics-based algorithm. Eur. J. Psychol. 12, 210–220 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Brophy, D. R. Understanding, measuring, and enhancing individual creative problem-solving efforts. Creat. Res. J. 11, 123–150 (1998).

    Article  Google Scholar 

  22. Amabile, T. M. The social psychology of creativity: a componential conceptualization. J. Pers. Soc. Psychol. 45, 357–376 (1983).

    Article  Google Scholar 

  23. Long, H. & Pang, W. Rater effects in creativity assessment: a mixed methods investigation. Think. Skills Creat. 15, 13–25 (2015).

    Article  Google Scholar 

  24. Dumas, D., Organisciak, P. & Doherty, M. Measuring divergent thinking originality with human raters and text-mining models: a psychometric comparison of methods. Psychol. Aesthet. Creat. Arts https://doi.org/10.1037/aca0000319. (2020).

  25. Beaty, R. E., Johnson, D. R., Zeitlen, D. C. & Forthmann, B. Semantic distance and the alternate uses task: recommendations for reliable automated assessment of originality. Creat. Res. J. 34, 245–260 (2022).

    Article  Google Scholar 

  26. Guilford, J. P. Creativity: yesterday, today and tomorrow. J. Creat. Behav. 1, 3–14 (1967).

    Article  Google Scholar 

  27. Wallach, M. A. & Kogan, N. A new look at the creativity-intelligence distinction. J. Pers. 33, 348–369 (1965).

    Article  CAS  PubMed  Google Scholar 

  28. Yang, Y., Youyou, W. & Uzzi, B. Estimating the deep replicability of scientific findings using human and artificial intelligence. Proc. Natl Acad. Sci. USA 117, 10762–10768 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Farquhar, S., Kossen, J., Kuhn, L. & Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Pividori, M. Chatbots in science: what can ChatGPT do for you? Nature https://doi.org/10.1038/d41586-024-02630-z (2024).

  31. Samdarshi, P. et al. Connecting the dots: evaluating abstract reasoning capabilities of LLMs using the New York Times Connections word game. In Proc. 2024 Conference on Empirical Methods in Natural Language Processing (eds Al-Onaizan, Y. et al.) (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.emnlp-main.1182

  32. Todd, G., Merino, T., Earle, S. & Togelius, J. Missed connections: lateral thinking puzzles for large language models. In Proc. 2024 IEEE Conference on Games (CoG) 1–8 (Institute of Electrical and Electronics Engineers, 2024).

  33. Cvrček, V. et al. Comparing web-crawled and traditional corpora. Lang. Resour. Eval. 54, 713–745 (2020).

    Article  Google Scholar 

  34. Horowitz, J. L. Bootstrap methods in econometrics. Annu. Rev. Econ. 11, 193–224 (2019).

    Article  Google Scholar 

  35. Efron, B. Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979).

    Article  Google Scholar 

  36. Jentzsch, S. & Kersting, K. ChatGPT is fun, but it is not funny! Humor is still challenging large language models. In Proc. 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis (eds Barnes, J. et al.) 325–340 (Association for Computational Linguistics, 2023).

  37. Castillo, L., León-Villagrá, P., Chater, N. & Sanborn, A. Explaining the flaws in human random generation as local sampling with momentum. PLoS Comput. Biol. 20, e1011739 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Angelike, T. & Musch, J. A comparative evaluation of measures to assess randomness in human-generated sequences. Behav. Res. Methods 56, 7831–7848 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Geva, E. & Ryan, E. Linguistic and cognitive correlates of academic skills in first and second languages. Lang. Learn. 43, 5–42 (1993).

    Article  Google Scholar 

  40. Henrickson, L. & Meroño-Peñuela, A. Prompting meaning: a hermeneutic approach to optimising prompt engineering with ChatGPT. AI Soc. https://doi.org/10.1007/s00146-023-01752-8 (2023).

  41. Giray, L. Prompt engineering with ChatGPT: a guide for academic writers. Ann. Biomed. Eng. 51, 2629–2633 (2023).

    Article  PubMed  Google Scholar 

  42. Lin, Z. How to write effective prompts for large language models. Nat. Hum. Behav. 8, 611–615 (2024).

    Article  PubMed  Google Scholar 

  43. Aggarwal, A., Lohia, P., Nagar, S., Dey, K. & Saha, D. Black box fairness testing of machine learning models. In Proc. 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Association for Computing Machinery, 2019); https://doi.org/10.1145/3338906.3338937

  44. Chao, P. et al. Jailbreaking black box large language models in twenty queries. In Proc. 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) 23–42 (Institute of Electrical and Electronics Engineers, 2025).

  45. Lapid, R., Langberg, R., & Sipper, M. Open sesame! Universal black-box jailbreaking of large language models. Appl. Sci. 14, 7150 (2024).

    Article  CAS  Google Scholar 

  46. Chesebrough, C., Chrysikou, E. G., Holyoak, K. J., Zhang, F. & Kounios, J. Conceptual change induced by analogical reasoning sparks aha moments. Creat. Res. J. 35, 499–521 (2023).

    Article  Google Scholar 

  47. Beaty, R. E. & Kenett, Y. N. Associative thinking at the core of creativity. Trends Cogn. Sci. 27, 671–683 (2023).

    Article  PubMed  Google Scholar 

  48. Te’eni, D. et al. Reciprocal human-machine learning: a theory and an instantiation for the case of message classification. Manage. Sci. https://doi.org/10.1287/mnsc.2022.03518 (2023).

  49. Yax, N., Anlló, H. & Palminteri, S. Studying and improving reasoning in humans and machines. Commun. Psychol. 2, 51 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Strachan, J. W. A. et al. Testing theory of mind in large language models and humans. Nat. Hum. Behav. 8, 1285–1295 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Bzdok, D. et al. Data science opportunities of large language models for neuroscience and biomedicine. Neuron 112, 698–717 (2024).

    Article  CAS  PubMed  Google Scholar 

  52. Padmakumar, V. & He, H. Does writing with language models reduce content diversity? In Proc. International Conference on Representation Learning (Kim, B. et al.) 642–669 (ICLR, 2024).

  53. Anderson, B. R., Shah, J. H. & Kreminski, M. Homogenization effects of large language models on human creative ideation. In Proc. 16th Conference on Creativity and Cognition (Association for Computing Machinery, 2024); https://doi.org/10.1145/3635636.3656204

  54. Mohammadi, B. Creativity has left the chat: the price of debiasing language models. Preprint at https://arxiv.org/abs/2406.05587 (2024).

  55. Groh, M. et al. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat. Med. 30, 573–583 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Marks, M. A., DeChurch, L. A., Mathieu, J. E., Panzer, F. J. & Alonso, A. Teamwork in multiteam systems. J. Appl. Psychol. 90, 964–971 (2005).

    Article  PubMed  Google Scholar 

  57. Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Bellemare-Pepin, A. et al. Divergent creativity in humans and large language models. Preprint at https://arxiv.org/abs/2405.13012 (2024).

  59. Chen, H. & Ding, N. Probing the ‘creativity’ of large language models: can models produce divergent semantic association? In Findings of the Association for Computational Linguistics: EMNLP 2023 (eds Bouamor, H. et al.) (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.findings-emnlp.858

  60. Childs, P. et al. The creativity diamond—a framework to aid creativity. J. Intell. 10, 73 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Chen, L. et al. TRIZ-GPT: an LLM-augmented method for problem-solving. In Proc. 36th International Conference on Design Theory and Methodology (DTM) V006T06A010 (American Society of Mechanical Engineers, 2024).

  62. Chen, L. et al. DesignFusion: integrating generative models for conceptual design enrichment. J. Mech. Des. 146, 111703 (2024).

    Article  Google Scholar 

  63. Hennessey, B. A., Amabile, T. M. & Mueller, J. S. in Encyclopedia of Creativity (Elsevier, 2011); https://doi.org/10.1016/B978-0-12-375038-9.00046-7

  64. Cropley, A. In praise of convergent thinking. Creat. Res. J. 18, 391–404 (2006).

    Article  Google Scholar 

  65. Wang, D. Presentation in self-posted facial images can expose sexual orientation: Implications for research and privacy. J. Pers. Soc. Psychol. 122, 806–824 (2022).

    Article  PubMed  Google Scholar 

  66. Taylor, J. E. T. & Taylor, G. W. Artificial cognition: how experimental psychology can help generate explainable artificial intelligence. Psychon. Bull. Rev. 28, 454–475 (2021).

    Article  PubMed  Google Scholar 

  67. Voudouris, K. et al. Direct human–AI comparison in the animal–AI environment. Front. Psychol. 13, 711821 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Hitsuwari, J., Ueda, Y., Yun, W. & Nomura, M. Does human–AI collaboration lead to more creative art? Aesthetic evaluation of human-made and AI-generated haiku poetry. Comput. Hum. Behav. 139, 107502 (2022).

    Article  Google Scholar 

  69. Griffiths, T. L. Understanding human intelligence through human limitations. Trends Cogn. Sci. 24, 873–883 (2020).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We thank Z. Dai, K. Savani and Z. Shen for their contributions to this work. D.W. was supported by the Seed Fund for Basic Research from the University of Hong Kong (grant no. 2201101303). D.H. was supported by the National Natural Science Foundation of China (grant nos. 72503232, 72574227 and T2293771). H.S. was supported by the Theme-based Research Fund provided by HKU Education Consulting (Shenzhen) Co., Ltd (grant SZRI2023-TBRF-03), the Research Grants Council of the Hong Kong Special Administrative Region, China (grant CRF-C7162-20G), and Strategic allocation 2018/19 (2c): Capacity Building for Development of ‘Business Analytics and Big Data’. B.U. was supported by the National Science Foundation through the NSF National Synthesis Center for Emergence in the Molecular and Cellular Sciences (grant MCB-2335029), Northwestern University’s Kellogg School of Management, Northwestern Institute on Complex Systems, and the Ryan Institute on Complexity. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Contributions

D.W., D.H., H.S. and B.U. designed the research. D.W. performed the research. D.W. and H.S. analysed the data. D.W., D.H. and B.U. wrote the paper.

Corresponding authors

Correspondence to Dawei Wang or Brian Uzzi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Tuhin Chakrabarty, Liuqing Chen and Ken Gilhooly for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–5, Figs. 1–17, Tables 1–38 and references.

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, D., Huang, D., Shen, H. et al. A large-scale comparison of divergent creativity in humans and large language models. Nat Hum Behav (2025). https://doi.org/10.1038/s41562-025-02331-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41562-025-02331-1

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing