Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Quantifying large language model usage in scientific papers

Abstract

Scientific publishing is the primary means of disseminating research findings. There has been speculation about how extensively large language models (LLMs) are being used in academic writing. Here we conduct a systematic analysis across 1,121,912 preprints and published papers from January 2020 to September 2024 on arXiv, bioRxiv and Nature portfolio journals, using a population-level framework based on word frequency shifts to estimate the prevalence of LLM-modified content over time. Our findings suggest a steady increase in LLM usage, with the largest and fastest growth estimated for computer science papers (up to 22%). By comparison, mathematics papers and the Nature portfolio showed lower evidence of LLM modification (up to 9%). LLM modification estimates were higher among papers from first authors who post preprints more frequently, papers in more crowded research areas and papers of shorter lengths. Our findings suggest that LLMs are being broadly used in scientific writing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Estimated fraction of LLM-modified sentences across research paper venues over time.
The alternative text for this image may have been generated using AI.
Fig. 2: Fine-grained validation of estimation accuracy under temporal distribution shift.
The alternative text for this image may have been generated using AI.
Fig. 3: Associations between LLM-modification and scientific publishing characteristics in arXiv computer science papers.
The alternative text for this image may have been generated using AI.
Fig. 4: Regional trends in the adoption of LLMs for academic writing.
The alternative text for this image may have been generated using AI.
Fig. 5: Word frequency shift in arXiv computer science abstracts over 14 years (2010–2024).
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

The datasets analysed in the current study are public at the following links: via arXiv at https://www.kaggle.com/datasets/Cornell-University/arxiv (ref. 58), via bioRxiv at https://github.com/nicholasmfraser/rbiorxiv (ref. 59) and via Nature portfolio at https://www.nature.com/nature-portfolio (ref. 60).

Code availability

The code can be accessed via GitHub at https://github.com/Weixin-Liang/Mapping-the-Increasing-Use-of-LLMs-in-Scientific-Papers (refs. 15,61). The study was conducted using Python 3.8.19, R 4.4.1.

References

  1. Okunytė, P. Google search exposes academics using ChatGPT in research papers. Cybernews https://cybernews.com/news/academic-cheating-chatgpt-openai/ (2023).

  2. Deguerin, M. AI-generated nonsense is leaking into scientific journals. Popular Science https://www.popsci.com/technology/ai-generated-text-scientific-journals/ (2024).

  3. Oransky, I. & Marcus, A. Papers and peer reviews with evidence of ChatGPT writing. Retraction Watch https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/ (2024).

  4. Conroy, G. Scientific sleuths spot dishonest ChatGPT use in papers. Nature https://doi.org/10.1038/d41586-023-02477-w (2023).

  5. Conroy, G. How ChatGPT and other AI tools could disrupt scientific publishing. Nature https://doi.org/10.1038/d41586-023-03144-w (2023).

  6. Vincent, J. ‘As an AI language model’: the phrase that shows how AI is pollulating the web. The Verge https://www.theverge.com/2023/4/25/23697218/ai-generated-spam-fake-user-reviews-as-an-ai-language-model (2023).

  7. Liang, W., Yuksekgonul, M., Mao, Y., Wu, E. & Zou, J. Y. GPT detectors are biased against non-native English writers. Patterns (N Y) https://doi.org/10.1016/j.patter.2023.100779 (2023).

  8. Yu, S., Luo, M., Madasu, A., Lal, V. & Howard, P. Is your paper being reviewed by an LLM? Investigating AI text detectability in peer review. In NeurIPS Safe Generative AI Workshop https://openreview.net/forum?id=f2G7C2fKxV (2024).

  9. Liang, W. et al. Monitoring AI-modified content at scale: a case study on the impact of ChatGPT on AI conference peer reviews. In Forty-first International Conference on Machine Learning https://openreview.net/forum?id=bX3J7ho18S (ICML, 2024).

  10. Clarification on large language model policy LLM. ICML https://icml.cc/Conferences/2023/llm-policy (2023).

  11. Thorp, H. H. ChatGPT is fun, but not an author. Science 379, 313 (2023).

  12. Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ research strategies. Am. Soc. Rev. 80, 875–908 (2015).

    Article  Google Scholar 

  13. Amano, T., González-Varo, J. P. & Sutherland, W. J. Languages are still a major barrier to global science. PLoS Biol. 14, e2000933 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Lee, M. et al. A design space for intelligent and interactive writing assistants. In Proc. 2024 CHI Conference on Human Factors in Computing Systems (eds Mueller, F. F. et al.) https://doi.org/10.1145/3613904.3642697 (Association for Computing Machinery, 2024)

  15. Liang, W. et al. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. NEJM AI https://ai.nejm.org/doi/full/10.1056/AIoa2400196 (2023).

  16. Lepp, H. & Smith, D. S. ‘You cannot sound like GPT’: signs of language discrimination and resistance in computer science publishing. In Proc. 2025 ACM Conference on Fairness, Accountability, and Transparency https://doi.org/10.1145/3715275.3732202 (Association for Computing Machinery, 2025).

  17. Agarwal, D., Naaman, M. & Vashistha, A. AI suggestions homogenize writing toward western styles and diminish cultural nuances. In Proc. 2025 CHI Conference on Human Factors in Computing Systems 1–21 (2025).

  18. Lepp, H. & Sarin, P. A global AI community requires language-diverse publishing. Preprint at https://arxiv.org/abs/2408.14772 (2024).

  19. Van Rossum, D. Generative AI top 150: the world’s most used AI tools. FlexOS https://www.flexos.work/learn/generative-ai-top-150 (2024).

  20. MacroPolo. The flobal AI talent tracker https://archivemacropolo.org/interactive/digital-projects/the-global-ai-talent-tracker/ (2024).

  21. Wiley. ExplanAItions: an artificial intelligence study by Wiley. https://www.wiley.com/en-us/ai-study (2023).

  22. Bianchini, S., Müller, M. & Pelletier, P. Drivers and barriers of ai adoption and use in scientific research. Preprint at https://arxiv.org/abs/2312.09843 (2023).

  23. Horowitz, M. C., Kahn, L., Macdonald, J. & Schneider, J. Adopting AI: how familiarity breeds both trust and contempt. AI Soc. 39, 1721–1735 (2024).

    Article  Google Scholar 

  24. Topsakal, Y. How familiarity, ease of use, usefulness, and trust influence the acceptance of generative artificial intelligence (AI)-assisted travel planning. Int. J. Hum. Comput. Interact. https://doi.org/10.1080/10447318.2024.2426044 (2024).

  25. Lavergne, T., Urvoy, T. & Yvon, F. Detecting fake content with relative entropy scoring. Pan https://dl.acm.org/doi/10.5555/3053718.3053722 (2008).

  26. Badaskar, S., Agarwal, S. & Arora, S. Identifying real or fake articles: towards better language modeling. In International Joint Conference on Natural Language Processing https://aclanthology.org/I08-2115/ (2008).

  27. Beresneva, D. Computer-generated text detection using machine learning: a systematic review. In International Conference on Applications of Natural Language to Data Bases https://doi.org/10.1007/978-3-319-41754-7_43 (Springer, 2016).

  28. Solaiman, I. et al. Release strategies and the social impacts of language models. Preprint at https://arxiv.org/abs/1908.09203 (2019).

  29. Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D. & Finn, C. DetectGPT: zero-shot machine-generated text detection using probability curvature. In Proc. 40th International Conference on Machine Learning Vol. 202 (eds. Krause, A. et al.) 24950–24962 (PMLR, 2023).

  30. Yang, X., Cheng, W., Petzold, L., Wang, W. Y. & Chen, H. DNA-GPT: divergent N-gram analysis for training-free detection of GPT-generated text. In The Twelfth International Conference on Learning Representations https://openreview.net/forum?id=Xlayxj2fWp(2024).

  31. Bao, G., Zhao, Y., Teng, Z., Yang, L. & Zhang, Y. Fast-DetectGPT: efficient zero-shot detection of machine-generated text via conditional probability curvature. In The Twelfth International Conference on Learning Representations https://openreview.net/forum?id=Bpcgcr8E8Z (ICLR, 2024).

  32. Tulchinskii, E. et al. Intrinsic dimension estimation for robust detection of AI-generated texts. In 37th Conference on Neural Information Processing Systems (NeurIPS 2023) https://openreview.net/pdf?id=8uOZ0kNji6 (NeurIPS, 2025).

  33. Bhagat, R. & Hovy, E. H. Squibs: what is a paraphrase? Comput. Linguist. 39, 463–472 (2013).

    Article  Google Scholar 

  34. Zellers, R. et al. Defending against neural fake news. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019) https://papers.neurips.cc/paper_files/paper/2019/file/3e9f0fc9b2f89e043bc6233994dfcf76-Paper.pdf (NeurIPS, 2019).

  35. Bakhtin, A. et al. Real or fake? Learning to discriminate machine from human generated text. Preprint at https://arxiv.org/abs/1906.03351 (2019).

  36. Uchendu, A., Le, T., Shu, K. & Lee, D. Authorship attribution for neural text generation. In Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2020.emnlp-main.673/ (2020).

  37. Chen, Y. et al. GPT-Sentinel: distinguishing human and ChatGPT generated content. Preprint at https://arxiv.org/abs/2305.07969 (2023).

  38. Yu, X. et al. GPT Paternity Test: GPT generated text detection with GPT genetic inheritance. Preprint at https://ar5iv.labs.arxiv.org/html/2305.12519 (2023).

  39. Li, Y. et al. MAGE: Machine-generated Text Detection in the Wild. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 36–53 (ACL, 2024).

  40. Liu, X., Zhang, Z., Wang, Y., Lan, Y. & Shen, C. CoCo: coherence-enhanced machine-generated text detection under data limitation with contrastive learning. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2023.emnlp-main.1005.pdf (ACL, 2023).

  41. Bhattacharjee, A., Kumarage, T., Moraffah, R. & Liu, H. ConDA: contrastive domain adaptation for AI-generated text detection. In Proc. 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) https://aclanthology.org/2023.ijcnlp-main.40/ (ACL, 2023).

  42. Hu, X., Chen, P.-Y. & Ho, T.-Y. RADAR: robust AI-text detection via adversarial learning. In 37th Conference on Neural Information Processing Systems (NeurIPS 2023) https://openreview.net/forum?id=QGrkbaan79 (NeurIPS, 2023).

  43. Wolff, M. Attacking neural text detectors. Preprint at https://arxiv.org/abs/2002.11768 (2022).

  44. GPT-2: 1.5B release. OpenAI https://openai.com/research/gpt-2-1-5b-release (2019).

  45. Jawahar, G., Abdul-Mageed, M. & Lakshmanan, L. V. Automatic detection of machine generated text: a critical survey. In Proc. 28th International Conference on Computational Linguistics https://aclanthology.org/2020.coling-main.208.pdf (ACL, 2020).

  46. Fagni, T., Falchi, F., Gambini, M., Martella, A. & Tesconi, M. TweepFake: about detecting deepfake tweets. Plos ONE 16, e0251415 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Ippolito, D., Duckworth, D., Callison-Burch, C. & Eck, D. Automatic detection of generated text is easiest when humans are fooled. In Proc. 58th Annual Meeting of the Association for Computational Linguistics https://aclanthology.org/2020.acl-main.164/ (ACL, 2019).

  48. Gehrmann, S., Strobelt, H. & Rush, A. M. GLTR: statistical detection and visualization of generated text. In Proc. 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations 111–116 (2019).

  49. Heikkilä, M. How to spot AI-generated text. MIT Technology Review https://www.technologyreview.com/2022/12/19/1065596/how-to-spot-ai-generated-text/ (2022).

  50. Crothers, E., Japkowicz, N. & Viktor, H. Machine generated text: a comprehensive survey of threat models and detection methods. Preprint at https://arxiv.org/abs/2210.07321 (2022).

  51. Kirchner, J. H., Ahmad, L., Aaronson, S. & Leike, J. New AI classifier for indicating AI-written text. OpenAI https://openai.com/index/new-ai-classifier-for-indicating-ai-written-text/ (2023).

  52. Kelly, S. M. ChatGPT creator pulls AI detection tool due to ‘low rate of accuracy’. CNN Business https://www.cnn.com/2023/07/25/tech/openai-ai-detection-tool/index.html (2023).

  53. Weber-Wulff, D. et al. Testing of detection tools for AI-generated text. Int. J. Educ. Integ. 19, 26 (2023).

    Article  Google Scholar 

  54. Sadasivan, V. S., Kumar, A., Balasubramanian, S., Wang, W. & Feizi, S. Can AI-generated text be reliably detected? In The Twelfth International Conference on Learning Representations https://openreview.net/forum?id=OOgsAZdFOt (2023).

  55. Chakraborty, S. et al. On the possibilities of AI-generated text detection. In Proc. 41st International Conference on Machine Learning Research https://proceedings.mlr.press/v235/chakraborty24a.html (2024).

  56. Lo, K., Wang, L. L., Neumann, M., Kinney, R. & Weld, D. S2ORC: The semantic scholar open research corpus. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 4969–4983 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.acl-main.447

  57. Jurafsky, D. & Martin, J. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models 3rd edn (Prentice Hall, 2025).

  58. arXiv.org submitters. arxiv dataset. kaggle https://www.kaggle.com/datasets/Cornell-University/arxiv/versions/165 (2024).

  59. Fraser, N. rbiorxiv: Client for the ‘bioRxiv’ API. GitHub https://github.com/nicholasmfraser/rbiorxiv (2024).

  60. Nature portfolio. Springer Nature https://www.nature.com/nature-portfolio (2025).

  61. Liang, W. Mapping the increasing use of LLMs in scientific papers. COLM https://openreview.net/forum?id=YX7QnhxESU (2024).

Download references

Acknowledgements

We thank D. A. McFarland, D. Jurafsky, Y. Yin, Z. Izzo, X. V. Lin, L. Chen and H. Ye for their helpful comments and discussions. J.Z. is supported by the National Science Foundation (grant nos. CCF 1763191 and CAREER 1942926), the US National Institutes of Health (grant nos. P30AG059307 and U01MH098953) and grants from the Silicon Valley Foundation and the Chan-Zuckerberg Initiative. H.L. is supported by the National Science Foundation (grant nos. 2244804 and 2022435) and the Stanford Institute for Human-Centered Artificial Intelligence (HAI).

Author information

Authors and Affiliations

Authors

Contributions

W.L. and Y.Z. designed the study and oversaw the quantification analysis. W.L. and Y.Z. provided the code for data analysis and conducted the analysis. W.L., Y.Z., Z.W., H.L., W.J., X.Z. and H.C. wrote the paper, with substantial input from all authors. All authors contributed to the review and editing of the paper. D.Y., C.P., C.D.M. and J.Z. provided the overall direction and planning of the project.

Corresponding authors

Correspondence to Weixin Liang or James Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Casey Greene, Phillip Howard and Ruixiang Tang for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–16, Tables 1 and 2 and Details on implementations and related work.

Reporting Summary (download PDF )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, W., Zhang, Y., Wu, Z. et al. Quantifying large language model usage in scientific papers. Nat Hum Behav 9, 2599–2609 (2025). https://doi.org/10.1038/s41562-025-02273-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41562-025-02273-8

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics