Abstract
Scientific publishing is the primary means of disseminating research findings. There has been speculation about how extensively large language models (LLMs) are being used in academic writing. Here we conduct a systematic analysis across 1,121,912 preprints and published papers from January 2020 to September 2024 on arXiv, bioRxiv and Nature portfolio journals, using a population-level framework based on word frequency shifts to estimate the prevalence of LLM-modified content over time. Our findings suggest a steady increase in LLM usage, with the largest and fastest growth estimated for computer science papers (up to 22%). By comparison, mathematics papers and the Nature portfolio showed lower evidence of LLM modification (up to 9%). LLM modification estimates were higher among papers from first authors who post preprints more frequently, papers in more crowded research areas and papers of shorter lengths. Our findings suggest that LLMs are being broadly used in scientific writing.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The datasets analysed in the current study are public at the following links: via arXiv at https://www.kaggle.com/datasets/Cornell-University/arxiv (ref. 58), via bioRxiv at https://github.com/nicholasmfraser/rbiorxiv (ref. 59) and via Nature portfolio at https://www.nature.com/nature-portfolio (ref. 60).
Code availability
The code can be accessed via GitHub at https://github.com/Weixin-Liang/Mapping-the-Increasing-Use-of-LLMs-in-Scientific-Papers (refs. 15,61). The study was conducted using Python 3.8.19, R 4.4.1.
References
Okunytė, P. Google search exposes academics using ChatGPT in research papers. Cybernews https://cybernews.com/news/academic-cheating-chatgpt-openai/ (2023).
Deguerin, M. AI-generated nonsense is leaking into scientific journals. Popular Science https://www.popsci.com/technology/ai-generated-text-scientific-journals/ (2024).
Oransky, I. & Marcus, A. Papers and peer reviews with evidence of ChatGPT writing. Retraction Watch https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/ (2024).
Conroy, G. Scientific sleuths spot dishonest ChatGPT use in papers. Nature https://doi.org/10.1038/d41586-023-02477-w (2023).
Conroy, G. How ChatGPT and other AI tools could disrupt scientific publishing. Nature https://doi.org/10.1038/d41586-023-03144-w (2023).
Vincent, J. ‘As an AI language model’: the phrase that shows how AI is pollulating the web. The Verge https://www.theverge.com/2023/4/25/23697218/ai-generated-spam-fake-user-reviews-as-an-ai-language-model (2023).
Liang, W., Yuksekgonul, M., Mao, Y., Wu, E. & Zou, J. Y. GPT detectors are biased against non-native English writers. Patterns (N Y) https://doi.org/10.1016/j.patter.2023.100779 (2023).
Yu, S., Luo, M., Madasu, A., Lal, V. & Howard, P. Is your paper being reviewed by an LLM? Investigating AI text detectability in peer review. In NeurIPS Safe Generative AI Workshop https://openreview.net/forum?id=f2G7C2fKxV (2024).
Liang, W. et al. Monitoring AI-modified content at scale: a case study on the impact of ChatGPT on AI conference peer reviews. In Forty-first International Conference on Machine Learning https://openreview.net/forum?id=bX3J7ho18S (ICML, 2024).
Clarification on large language model policy LLM. ICML https://icml.cc/Conferences/2023/llm-policy (2023).
Thorp, H. H. ChatGPT is fun, but not an author. Science 379, 313 (2023).
Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ research strategies. Am. Soc. Rev. 80, 875–908 (2015).
Amano, T., González-Varo, J. P. & Sutherland, W. J. Languages are still a major barrier to global science. PLoS Biol. 14, e2000933 (2016).
Lee, M. et al. A design space for intelligent and interactive writing assistants. In Proc. 2024 CHI Conference on Human Factors in Computing Systems (eds Mueller, F. F. et al.) https://doi.org/10.1145/3613904.3642697 (Association for Computing Machinery, 2024)
Liang, W. et al. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. NEJM AI https://ai.nejm.org/doi/full/10.1056/AIoa2400196 (2023).
Lepp, H. & Smith, D. S. ‘You cannot sound like GPT’: signs of language discrimination and resistance in computer science publishing. In Proc. 2025 ACM Conference on Fairness, Accountability, and Transparency https://doi.org/10.1145/3715275.3732202 (Association for Computing Machinery, 2025).
Agarwal, D., Naaman, M. & Vashistha, A. AI suggestions homogenize writing toward western styles and diminish cultural nuances. In Proc. 2025 CHI Conference on Human Factors in Computing Systems 1–21 (2025).
Lepp, H. & Sarin, P. A global AI community requires language-diverse publishing. Preprint at https://arxiv.org/abs/2408.14772 (2024).
Van Rossum, D. Generative AI top 150: the world’s most used AI tools. FlexOS https://www.flexos.work/learn/generative-ai-top-150 (2024).
MacroPolo. The flobal AI talent tracker https://archivemacropolo.org/interactive/digital-projects/the-global-ai-talent-tracker/ (2024).
Wiley. ExplanAItions: an artificial intelligence study by Wiley. https://www.wiley.com/en-us/ai-study (2023).
Bianchini, S., Müller, M. & Pelletier, P. Drivers and barriers of ai adoption and use in scientific research. Preprint at https://arxiv.org/abs/2312.09843 (2023).
Horowitz, M. C., Kahn, L., Macdonald, J. & Schneider, J. Adopting AI: how familiarity breeds both trust and contempt. AI Soc. 39, 1721–1735 (2024).
Topsakal, Y. How familiarity, ease of use, usefulness, and trust influence the acceptance of generative artificial intelligence (AI)-assisted travel planning. Int. J. Hum. Comput. Interact. https://doi.org/10.1080/10447318.2024.2426044 (2024).
Lavergne, T., Urvoy, T. & Yvon, F. Detecting fake content with relative entropy scoring. Pan https://dl.acm.org/doi/10.5555/3053718.3053722 (2008).
Badaskar, S., Agarwal, S. & Arora, S. Identifying real or fake articles: towards better language modeling. In International Joint Conference on Natural Language Processing https://aclanthology.org/I08-2115/ (2008).
Beresneva, D. Computer-generated text detection using machine learning: a systematic review. In International Conference on Applications of Natural Language to Data Bases https://doi.org/10.1007/978-3-319-41754-7_43 (Springer, 2016).
Solaiman, I. et al. Release strategies and the social impacts of language models. Preprint at https://arxiv.org/abs/1908.09203 (2019).
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D. & Finn, C. DetectGPT: zero-shot machine-generated text detection using probability curvature. In Proc. 40th International Conference on Machine Learning Vol. 202 (eds. Krause, A. et al.) 24950–24962 (PMLR, 2023).
Yang, X., Cheng, W., Petzold, L., Wang, W. Y. & Chen, H. DNA-GPT: divergent N-gram analysis for training-free detection of GPT-generated text. In The Twelfth International Conference on Learning Representations https://openreview.net/forum?id=Xlayxj2fWp(2024).
Bao, G., Zhao, Y., Teng, Z., Yang, L. & Zhang, Y. Fast-DetectGPT: efficient zero-shot detection of machine-generated text via conditional probability curvature. In The Twelfth International Conference on Learning Representations https://openreview.net/forum?id=Bpcgcr8E8Z (ICLR, 2024).
Tulchinskii, E. et al. Intrinsic dimension estimation for robust detection of AI-generated texts. In 37th Conference on Neural Information Processing Systems (NeurIPS 2023) https://openreview.net/pdf?id=8uOZ0kNji6 (NeurIPS, 2025).
Bhagat, R. & Hovy, E. H. Squibs: what is a paraphrase? Comput. Linguist. 39, 463–472 (2013).
Zellers, R. et al. Defending against neural fake news. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019) https://papers.neurips.cc/paper_files/paper/2019/file/3e9f0fc9b2f89e043bc6233994dfcf76-Paper.pdf (NeurIPS, 2019).
Bakhtin, A. et al. Real or fake? Learning to discriminate machine from human generated text. Preprint at https://arxiv.org/abs/1906.03351 (2019).
Uchendu, A., Le, T., Shu, K. & Lee, D. Authorship attribution for neural text generation. In Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2020.emnlp-main.673/ (2020).
Chen, Y. et al. GPT-Sentinel: distinguishing human and ChatGPT generated content. Preprint at https://arxiv.org/abs/2305.07969 (2023).
Yu, X. et al. GPT Paternity Test: GPT generated text detection with GPT genetic inheritance. Preprint at https://ar5iv.labs.arxiv.org/html/2305.12519 (2023).
Li, Y. et al. MAGE: Machine-generated Text Detection in the Wild. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 36–53 (ACL, 2024).
Liu, X., Zhang, Z., Wang, Y., Lan, Y. & Shen, C. CoCo: coherence-enhanced machine-generated text detection under data limitation with contrastive learning. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2023.emnlp-main.1005.pdf (ACL, 2023).
Bhattacharjee, A., Kumarage, T., Moraffah, R. & Liu, H. ConDA: contrastive domain adaptation for AI-generated text detection. In Proc. 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) https://aclanthology.org/2023.ijcnlp-main.40/ (ACL, 2023).
Hu, X., Chen, P.-Y. & Ho, T.-Y. RADAR: robust AI-text detection via adversarial learning. In 37th Conference on Neural Information Processing Systems (NeurIPS 2023) https://openreview.net/forum?id=QGrkbaan79 (NeurIPS, 2023).
Wolff, M. Attacking neural text detectors. Preprint at https://arxiv.org/abs/2002.11768 (2022).
GPT-2: 1.5B release. OpenAI https://openai.com/research/gpt-2-1-5b-release (2019).
Jawahar, G., Abdul-Mageed, M. & Lakshmanan, L. V. Automatic detection of machine generated text: a critical survey. In Proc. 28th International Conference on Computational Linguistics https://aclanthology.org/2020.coling-main.208.pdf (ACL, 2020).
Fagni, T., Falchi, F., Gambini, M., Martella, A. & Tesconi, M. TweepFake: about detecting deepfake tweets. Plos ONE 16, e0251415 (2021).
Ippolito, D., Duckworth, D., Callison-Burch, C. & Eck, D. Automatic detection of generated text is easiest when humans are fooled. In Proc. 58th Annual Meeting of the Association for Computational Linguistics https://aclanthology.org/2020.acl-main.164/ (ACL, 2019).
Gehrmann, S., Strobelt, H. & Rush, A. M. GLTR: statistical detection and visualization of generated text. In Proc. 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations 111–116 (2019).
Heikkilä, M. How to spot AI-generated text. MIT Technology Review https://www.technologyreview.com/2022/12/19/1065596/how-to-spot-ai-generated-text/ (2022).
Crothers, E., Japkowicz, N. & Viktor, H. Machine generated text: a comprehensive survey of threat models and detection methods. Preprint at https://arxiv.org/abs/2210.07321 (2022).
Kirchner, J. H., Ahmad, L., Aaronson, S. & Leike, J. New AI classifier for indicating AI-written text. OpenAI https://openai.com/index/new-ai-classifier-for-indicating-ai-written-text/ (2023).
Kelly, S. M. ChatGPT creator pulls AI detection tool due to ‘low rate of accuracy’. CNN Business https://www.cnn.com/2023/07/25/tech/openai-ai-detection-tool/index.html (2023).
Weber-Wulff, D. et al. Testing of detection tools for AI-generated text. Int. J. Educ. Integ. 19, 26 (2023).
Sadasivan, V. S., Kumar, A., Balasubramanian, S., Wang, W. & Feizi, S. Can AI-generated text be reliably detected? In The Twelfth International Conference on Learning Representations https://openreview.net/forum?id=OOgsAZdFOt (2023).
Chakraborty, S. et al. On the possibilities of AI-generated text detection. In Proc. 41st International Conference on Machine Learning Research https://proceedings.mlr.press/v235/chakraborty24a.html (2024).
Lo, K., Wang, L. L., Neumann, M., Kinney, R. & Weld, D. S2ORC: The semantic scholar open research corpus. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 4969–4983 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.acl-main.447
Jurafsky, D. & Martin, J. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models 3rd edn (Prentice Hall, 2025).
arXiv.org submitters. arxiv dataset. kaggle https://www.kaggle.com/datasets/Cornell-University/arxiv/versions/165 (2024).
Fraser, N. rbiorxiv: Client for the ‘bioRxiv’ API. GitHub https://github.com/nicholasmfraser/rbiorxiv (2024).
Nature portfolio. Springer Nature https://www.nature.com/nature-portfolio (2025).
Liang, W. Mapping the increasing use of LLMs in scientific papers. COLM https://openreview.net/forum?id=YX7QnhxESU (2024).
Acknowledgements
We thank D. A. McFarland, D. Jurafsky, Y. Yin, Z. Izzo, X. V. Lin, L. Chen and H. Ye for their helpful comments and discussions. J.Z. is supported by the National Science Foundation (grant nos. CCF 1763191 and CAREER 1942926), the US National Institutes of Health (grant nos. P30AG059307 and U01MH098953) and grants from the Silicon Valley Foundation and the Chan-Zuckerberg Initiative. H.L. is supported by the National Science Foundation (grant nos. 2244804 and 2022435) and the Stanford Institute for Human-Centered Artificial Intelligence (HAI).
Author information
Authors and Affiliations
Contributions
W.L. and Y.Z. designed the study and oversaw the quantification analysis. W.L. and Y.Z. provided the code for data analysis and conducted the analysis. W.L., Y.Z., Z.W., H.L., W.J., X.Z. and H.C. wrote the paper, with substantial input from all authors. All authors contributed to the review and editing of the paper. D.Y., C.P., C.D.M. and J.Z. provided the overall direction and planning of the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Human Behaviour thanks Casey Greene, Phillip Howard and Ruixiang Tang for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–16, Tables 1 and 2 and Details on implementations and related work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, W., Zhang, Y., Wu, Z. et al. Quantifying large language model usage in scientific papers. Nat Hum Behav 9, 2599–2609 (2025). https://doi.org/10.1038/s41562-025-02273-8
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41562-025-02273-8
This article is cited by
-
The adoption and efficacy of large language models in US consumer financial complaints
Nature Human Behaviour (2026)
-
Exploring the use of AI authors and reviewers at Agents4Science
Nature Biotechnology (2026)
-
How academic competition fosters GenAI dependency in research among Chinese STEM postgraduates? A mixed-methods approach
Higher Education (2026)


