Quantifying large language model usage in scientific papers

Liang, Weixin; Zhang, Yaohui; Wu, Zhengxuan; Lepp, Haley; Ji, Wenlong; Zhao, Xuandong; Cao, Hancheng; Liu, Sheng; He, Siyu; Huang, Zhi; Yang, Diyi; Potts, Christopher; Manning, Christopher D.; Zou, James

doi:10.1038/s41562-025-02273-8

Article
Published: 04 August 2025

Quantifying large language model usage in scientific papers

Weixin Liang ORCID: orcid.org/0000-0001-9924-693X¹^na1,
Yaohui Zhang ORCID: orcid.org/0009-0008-4507-5299²^na1,
Zhengxuan Wu¹,
Haley Lepp ORCID: orcid.org/0009-0003-9789-7415³,
Wenlong Ji⁴,
Xuandong Zhao⁵,
Hancheng Cao^1,6,
Sheng Liu⁷,
Siyu He⁷,
Zhi Huang⁷,
Diyi Yang¹,
Christopher Potts^1,8^na2,
Christopher D. Manning^1,8^na2 &
…
James Zou ORCID: orcid.org/0000-0001-8880-4764^1,2,7^na2

Nature Human Behaviour volume 9, pages 2599–2609 (2025) Cite this article

10k Accesses
41 Citations
224 Altmetric
Metrics details

Subjects

Abstract

Scientific publishing is the primary means of disseminating research findings. There has been speculation about how extensively large language models (LLMs) are being used in academic writing. Here we conduct a systematic analysis across 1,121,912 preprints and published papers from January 2020 to September 2024 on arXiv, bioRxiv and Nature portfolio journals, using a population-level framework based on word frequency shifts to estimate the prevalence of LLM-modified content over time. Our findings suggest a steady increase in LLM usage, with the largest and fastest growth estimated for computer science papers (up to 22%). By comparison, mathematics papers and the Nature portfolio showed lower evidence of LLM modification (up to 9%). LLM modification estimates were higher among papers from first authors who post preprints more frequently, papers in more crowded research areas and papers of shorter lengths. Our findings suggest that LLMs are being broadly used in scientific writing.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Estimated fraction of LLM-modified sentences across research paper venues over time.**

**Fig. 2: Fine-grained validation of estimation accuracy under temporal distribution shift.**

**Fig. 3: Associations between LLM-modification and scientific publishing characteristics in *arXiv* computer science papers.**

**Fig. 4: Regional trends in the adoption of LLMs for academic writing.**

**Fig. 5: Word frequency shift in *arXiv* computer science abstracts over 14 years (2010–2024).**

Human researchers are superior to large language models in writing a medical systematic review in a comparative multitask assessment

Article Open access 01 December 2025

Use of large language models as artificial intelligence tools in academic research and publishing among global clinical researchers

Article Open access 30 December 2024

Self-reflection enhances large language models towards substantial academic response

Article Open access 01 December 2025

Data availability

The datasets analysed in the current study are public at the following links: via arXiv at https://www.kaggle.com/datasets/Cornell-University/arxiv (ref. ⁵⁸), via bioRxiv at https://github.com/nicholasmfraser/rbiorxiv (ref. ⁵⁹) and via Nature portfolio at https://www.nature.com/nature-portfolio (ref. ⁶⁰).

Code availability

The code can be accessed via GitHub at https://github.com/Weixin-Liang/Mapping-the-Increasing-Use-of-LLMs-in-Scientific-Papers (refs. ^15,61). The study was conducted using Python 3.8.19, R 4.4.1.

References

Okunytė, P. Google search exposes academics using ChatGPT in research papers. Cybernews https://cybernews.com/news/academic-cheating-chatgpt-openai/ (2023).
Deguerin, M. AI-generated nonsense is leaking into scientific journals. Popular Science https://www.popsci.com/technology/ai-generated-text-scientific-journals/ (2024).
Oransky, I. & Marcus, A. Papers and peer reviews with evidence of ChatGPT writing. Retraction Watch https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/ (2024).
Conroy, G. Scientific sleuths spot dishonest ChatGPT use in papers. Nature https://doi.org/10.1038/d41586-023-02477-w (2023).
Conroy, G. How ChatGPT and other AI tools could disrupt scientific publishing. Nature https://doi.org/10.1038/d41586-023-03144-w (2023).
Vincent, J. ‘As an AI language model’: the phrase that shows how AI is pollulating the web. The Verge https://www.theverge.com/2023/4/25/23697218/ai-generated-spam-fake-user-reviews-as-an-ai-language-model (2023).
Liang, W., Yuksekgonul, M., Mao, Y., Wu, E. & Zou, J. Y. GPT detectors are biased against non-native English writers. Patterns (N Y) https://doi.org/10.1016/j.patter.2023.100779 (2023).
Yu, S., Luo, M., Madasu, A., Lal, V. & Howard, P. Is your paper being reviewed by an LLM? Investigating AI text detectability in peer review. In NeurIPS Safe Generative AI Workshop https://openreview.net/forum?id=f2G7C2fKxV (2024).
Liang, W. et al. Monitoring AI-modified content at scale: a case study on the impact of ChatGPT on AI conference peer reviews. In Forty-first International Conference on Machine Learning https://openreview.net/forum?id=bX3J7ho18S (ICML, 2024).
Clarification on large language model policy LLM. ICML https://icml.cc/Conferences/2023/llm-policy (2023).
Thorp, H. H. ChatGPT is fun, but not an author. Science 379, 313 (2023).
Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ research strategies. Am. Soc. Rev. 80, 875–908 (2015).
Article Google Scholar
Amano, T., González-Varo, J. P. & Sutherland, W. J. Languages are still a major barrier to global science. PLoS Biol. 14, e2000933 (2016).
Article PubMed PubMed Central Google Scholar
Lee, M. et al. A design space for intelligent and interactive writing assistants. In Proc. 2024 CHI Conference on Human Factors in Computing Systems (eds Mueller, F. F. et al.) https://doi.org/10.1145/3613904.3642697 (Association for Computing Machinery, 2024)
Liang, W. et al. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. NEJM AI https://ai.nejm.org/doi/full/10.1056/AIoa2400196 (2023).
Lepp, H. & Smith, D. S. ‘You cannot sound like GPT’: signs of language discrimination and resistance in computer science publishing. In Proc. 2025 ACM Conference on Fairness, Accountability, and Transparency https://doi.org/10.1145/3715275.3732202 (Association for Computing Machinery, 2025).
Agarwal, D., Naaman, M. & Vashistha, A. AI suggestions homogenize writing toward western styles and diminish cultural nuances. In Proc. 2025 CHI Conference on Human Factors in Computing Systems 1–21 (2025).
Lepp, H. & Sarin, P. A global AI community requires language-diverse publishing. Preprint at https://arxiv.org/abs/2408.14772 (2024).
Van Rossum, D. Generative AI top 150: the world’s most used AI tools. FlexOS https://www.flexos.work/learn/generative-ai-top-150 (2024).
MacroPolo. The flobal AI talent tracker https://archivemacropolo.org/interactive/digital-projects/the-global-ai-talent-tracker/ (2024).
Wiley. ExplanAItions: an artificial intelligence study by Wiley. https://www.wiley.com/en-us/ai-study (2023).
Bianchini, S., Müller, M. & Pelletier, P. Drivers and barriers of ai adoption and use in scientific research. Preprint at https://arxiv.org/abs/2312.09843 (2023).
Horowitz, M. C., Kahn, L., Macdonald, J. & Schneider, J. Adopting AI: how familiarity breeds both trust and contempt. AI Soc. 39, 1721–1735 (2024).
Article Google Scholar
Topsakal, Y. How familiarity, ease of use, usefulness, and trust influence the acceptance of generative artificial intelligence (AI)-assisted travel planning. Int. J. Hum. Comput. Interact. https://doi.org/10.1080/10447318.2024.2426044 (2024).
Lavergne, T., Urvoy, T. & Yvon, F. Detecting fake content with relative entropy scoring. Pan https://dl.acm.org/doi/10.5555/3053718.3053722 (2008).
Badaskar, S., Agarwal, S. & Arora, S. Identifying real or fake articles: towards better language modeling. In International Joint Conference on Natural Language Processing https://aclanthology.org/I08-2115/ (2008).
Beresneva, D. Computer-generated text detection using machine learning: a systematic review. In International Conference on Applications of Natural Language to Data Bases https://doi.org/10.1007/978-3-319-41754-7_43 (Springer, 2016).
Solaiman, I. et al. Release strategies and the social impacts of language models. Preprint at https://arxiv.org/abs/1908.09203 (2019).
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D. & Finn, C. DetectGPT: zero-shot machine-generated text detection using probability curvature. In Proc. 40th International Conference on Machine Learning Vol. 202 (eds. Krause, A. et al.) 24950–24962 (PMLR, 2023).
Yang, X., Cheng, W., Petzold, L., Wang, W. Y. & Chen, H. DNA-GPT: divergent N-gram analysis for training-free detection of GPT-generated text. In The Twelfth International Conference on Learning Representations https://openreview.net/forum?id=Xlayxj2fWp(2024).
Bao, G., Zhao, Y., Teng, Z., Yang, L. & Zhang, Y. Fast-DetectGPT: efficient zero-shot detection of machine-generated text via conditional probability curvature. In The Twelfth International Conference on Learning Representations https://openreview.net/forum?id=Bpcgcr8E8Z (ICLR, 2024).
Tulchinskii, E. et al. Intrinsic dimension estimation for robust detection of AI-generated texts. In 37th Conference on Neural Information Processing Systems (NeurIPS 2023) https://openreview.net/pdf?id=8uOZ0kNji6 (NeurIPS, 2025).
Bhagat, R. & Hovy, E. H. Squibs: what is a paraphrase? Comput. Linguist. 39, 463–472 (2013).
Article Google Scholar
Zellers, R. et al. Defending against neural fake news. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019) https://papers.neurips.cc/paper_files/paper/2019/file/3e9f0fc9b2f89e043bc6233994dfcf76-Paper.pdf (NeurIPS, 2019).
Bakhtin, A. et al. Real or fake? Learning to discriminate machine from human generated text. Preprint at https://arxiv.org/abs/1906.03351 (2019).
Uchendu, A., Le, T., Shu, K. & Lee, D. Authorship attribution for neural text generation. In Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2020.emnlp-main.673/ (2020).
Chen, Y. et al. GPT-Sentinel: distinguishing human and ChatGPT generated content. Preprint at https://arxiv.org/abs/2305.07969 (2023).
Yu, X. et al. GPT Paternity Test: GPT generated text detection with GPT genetic inheritance. Preprint at https://ar5iv.labs.arxiv.org/html/2305.12519 (2023).
Li, Y. et al. MAGE: Machine-generated Text Detection in the Wild. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 36–53 (ACL, 2024).
Liu, X., Zhang, Z., Wang, Y., Lan, Y. & Shen, C. CoCo: coherence-enhanced machine-generated text detection under data limitation with contrastive learning. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2023.emnlp-main.1005.pdf (ACL, 2023).
Bhattacharjee, A., Kumarage, T., Moraffah, R. & Liu, H. ConDA: contrastive domain adaptation for AI-generated text detection. In Proc. 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) https://aclanthology.org/2023.ijcnlp-main.40/ (ACL, 2023).
Hu, X., Chen, P.-Y. & Ho, T.-Y. RADAR: robust AI-text detection via adversarial learning. In 37th Conference on Neural Information Processing Systems (NeurIPS 2023) https://openreview.net/forum?id=QGrkbaan79 (NeurIPS, 2023).
Wolff, M. Attacking neural text detectors. Preprint at https://arxiv.org/abs/2002.11768 (2022).
GPT-2: 1.5B release. OpenAI https://openai.com/research/gpt-2-1-5b-release (2019).
Jawahar, G., Abdul-Mageed, M. & Lakshmanan, L. V. Automatic detection of machine generated text: a critical survey. In Proc. 28th International Conference on Computational Linguistics https://aclanthology.org/2020.coling-main.208.pdf (ACL, 2020).
Fagni, T., Falchi, F., Gambini, M., Martella, A. & Tesconi, M. TweepFake: about detecting deepfake tweets. Plos ONE 16, e0251415 (2021).
Article PubMed PubMed Central CAS Google Scholar
Ippolito, D., Duckworth, D., Callison-Burch, C. & Eck, D. Automatic detection of generated text is easiest when humans are fooled. In Proc. 58th Annual Meeting of the Association for Computational Linguistics https://aclanthology.org/2020.acl-main.164/ (ACL, 2019).
Gehrmann, S., Strobelt, H. & Rush, A. M. GLTR: statistical detection and visualization of generated text. In Proc. 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations 111–116 (2019).
Heikkilä, M. How to spot AI-generated text. MIT Technology Review https://www.technologyreview.com/2022/12/19/1065596/how-to-spot-ai-generated-text/ (2022).
Crothers, E., Japkowicz, N. & Viktor, H. Machine generated text: a comprehensive survey of threat models and detection methods. Preprint at https://arxiv.org/abs/2210.07321 (2022).
Kirchner, J. H., Ahmad, L., Aaronson, S. & Leike, J. New AI classifier for indicating AI-written text. OpenAI https://openai.com/index/new-ai-classifier-for-indicating-ai-written-text/ (2023).
Kelly, S. M. ChatGPT creator pulls AI detection tool due to ‘low rate of accuracy’. CNN Business https://www.cnn.com/2023/07/25/tech/openai-ai-detection-tool/index.html (2023).
Weber-Wulff, D. et al. Testing of detection tools for AI-generated text. Int. J. Educ. Integ. 19, 26 (2023).
Article Google Scholar
Sadasivan, V. S., Kumar, A., Balasubramanian, S., Wang, W. & Feizi, S. Can AI-generated text be reliably detected? In The Twelfth International Conference on Learning Representations https://openreview.net/forum?id=OOgsAZdFOt (2023).
Chakraborty, S. et al. On the possibilities of AI-generated text detection. In Proc. 41st International Conference on Machine Learning Research https://proceedings.mlr.press/v235/chakraborty24a.html (2024).
Lo, K., Wang, L. L., Neumann, M., Kinney, R. & Weld, D. S2ORC: The semantic scholar open research corpus. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 4969–4983 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.acl-main.447
Jurafsky, D. & Martin, J. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models 3rd edn (Prentice Hall, 2025).
arXiv.org submitters. arxiv dataset. kaggle https://www.kaggle.com/datasets/Cornell-University/arxiv/versions/165 (2024).
Fraser, N. rbiorxiv: Client for the ‘bioRxiv’ API. GitHub https://github.com/nicholasmfraser/rbiorxiv (2024).
Nature portfolio. Springer Nature https://www.nature.com/nature-portfolio (2025).
Liang, W. Mapping the increasing use of LLMs in scientific papers. COLM https://openreview.net/forum?id=YX7QnhxESU (2024).

Download references

Acknowledgements

We thank D. A. McFarland, D. Jurafsky, Y. Yin, Z. Izzo, X. V. Lin, L. Chen and H. Ye for their helpful comments and discussions. J.Z. is supported by the National Science Foundation (grant nos. CCF 1763191 and CAREER 1942926), the US National Institutes of Health (grant nos. P30AG059307 and U01MH098953) and grants from the Silicon Valley Foundation and the Chan-Zuckerberg Initiative. H.L. is supported by the National Science Foundation (grant nos. 2244804 and 2022435) and the Stanford Institute for Human-Centered Artificial Intelligence (HAI).

Author information

These authors contributed equally: Weixin Liang, Yaohui Zhang.
These authors jointly supervised this work: Christopher Potts, Christopher D. Manning, James Zou.

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, CA, USA
Weixin Liang, Zhengxuan Wu, Hancheng Cao, Diyi Yang, Christopher Potts, Christopher D. Manning & James Zou
Department of Electrical Engineering, Stanford University, Stanford, CA, USA
Yaohui Zhang & James Zou
Graduate School of Education, Stanford University, Stanford, CA, USA
Haley Lepp
Department of Statistics, Stanford University, Stanford, CA, USA
Wenlong Ji
Department of Computer Science, University of California, Santa Barbara, Santa Barbara, CA, USA
Xuandong Zhao
Goizueta Business School, Emory University, Atlanta, GA, USA
Hancheng Cao
Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
Sheng Liu, Siyu He, Zhi Huang & James Zou
Department of Linguistics, Stanford University, Stanford, CA, USA
Christopher Potts & Christopher D. Manning

Authors

Weixin Liang
View author publications
Search author on:PubMed Google Scholar
Yaohui Zhang
View author publications
Search author on:PubMed Google Scholar
Zhengxuan Wu
View author publications
Search author on:PubMed Google Scholar
Haley Lepp
View author publications
Search author on:PubMed Google Scholar
Wenlong Ji
View author publications
Search author on:PubMed Google Scholar
Xuandong Zhao
View author publications
Search author on:PubMed Google Scholar
Hancheng Cao
View author publications
Search author on:PubMed Google Scholar
Sheng Liu
View author publications
Search author on:PubMed Google Scholar
Siyu He
View author publications
Search author on:PubMed Google Scholar
Zhi Huang
View author publications
Search author on:PubMed Google Scholar
Diyi Yang
View author publications
Search author on:PubMed Google Scholar
Christopher Potts
View author publications
Search author on:PubMed Google Scholar
Christopher D. Manning
View author publications
Search author on:PubMed Google Scholar
James Zou
View author publications
Search author on:PubMed Google Scholar

Contributions

W.L. and Y.Z. designed the study and oversaw the quantification analysis. W.L. and Y.Z. provided the code for data analysis and conducted the analysis. W.L., Y.Z., Z.W., H.L., W.J., X.Z. and H.C. wrote the paper, with substantial input from all authors. All authors contributed to the review and editing of the paper. D.Y., C.P., C.D.M. and J.Z. provided the overall direction and planning of the project.

Corresponding authors

Correspondence to Weixin Liang or James Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Casey Greene, Phillip Howard and Ruixiang Tang for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–16, Tables 1 and 2 and Details on implementations and related work.

Reporting Summary (download PDF )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liang, W., Zhang, Y., Wu, Z. et al. Quantifying large language model usage in scientific papers. Nat Hum Behav 9, 2599–2609 (2025). https://doi.org/10.1038/s41562-025-02273-8

Download citation

Received: 02 May 2024
Accepted: 19 June 2025
Published: 04 August 2025
Version of record: 04 August 2025
Issue date: December 2025
DOI: https://doi.org/10.1038/s41562-025-02273-8

This article is cited by

The adoption and efficacy of large language models in US consumer financial complaints
- Minkyu Shin
- Jin Kim
- Jiwoong Shin
Nature Human Behaviour (2026)
Exploring the use of AI authors and reviewers at Agents4Science
- Federico Bianchi
- Owen Queen
- James Zou
Nature Biotechnology (2026)
How academic competition fosters GenAI dependency in research among Chinese STEM postgraduates? A mixed-methods approach
- Yating Huang
- Keying Zhang
- Rentong Pan
Higher Education (2026)