Abstract
Understanding how sentences are represented in the human brain, as well as in large language models (LLMs), poses a substantial challenge for cognitive science. Here we develop a one-shot learning task to investigate whether humans and LLMs encode tree-structured constituents within sentences. Participants (total N = 372, native Chinese or English speakers, and bilingual in Chinese and English) and LLMs (for example, ChatGPT) were asked to infer which words should be deleted from a sentence. Both groups tend to delete constituents, instead of non-constituent word strings, following rules specific to Chinese and English, respectively. The results cannot be explained by models that rely only on word properties and word positions. Crucially, based on word strings deleted by either humans or LLMs, the underlying constituency tree structure can be successfully reconstructed. Altogether, these results demonstrate that latent tree-structured sentence representations emerge in both humans and LLMs.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
All sentences and experiment results are available via GitHub at https://github.com/y1ny/WordDeletion, except for the treebank sentences, which are available at https://catalog.ldc.upenn.edu/LDC99T42 (PTB) and https://catalog.ldc.upenn.edu/LDC2013T21 (CTB). Source data are provided with this paper.
Code availability
All scripts are available via GitHub at https://github.com/y1ny/WordDeletion.
References
Bloomfield, L. Language (Univ. Chicago Press, 1933).
Chomsky, N. Syntactic Structures (De Gruyter Mouton, 1957).
Tesnière, L. Eléments de Syntaxe Structurale (Librairie C. Klincksieck, 1959).
Mel’cuk, I. A. Dependency Syntax: Theory and Practice (SUNY Press, 1988).
Miller, P. H. Strong Generative Capacity (CSLI Publications, 1999).
Klein, D. & Manning, C. Corpus-based induction of syntactic structure: models of dependency and constituency. In Proc. 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04) 478–485 (Association for Computational Linguistics, 2004); https://doi.org/10.3115/1218955.1219016
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) 5998–6008 (Curran Associates, 2017).
Fodor, J. A. & Bever, T. G. The psychological reality of linguistic segments. J. Verbal Learn. Verbal Behav. 4, 414–420 (1965).
Jarvella, R. J. Syntactic processing of connected speech. J. Verbal Learn. Verbal Behav. 10, 409–416 (1971).
Pallier, C., Devauchelle, A.-D. & Dehaene, S. Cortical representation of the constituent structure of sentences. Proc. Natl Acad. Sci. USA 108, 2522–2527 (2011).
Brennan, J. R., Stabler, E. P., Wagenen, S. E. V., Luh, W.-M. & Hale, J. T. Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain Lang. 157–158, 81–94 (2016).
Ding, N., Melloni, L., Zhang, H., Tian, X. & Poeppel, D. Cortical tracking of hierarchical linguistic structures in connected speech. Nat. Neurosci. 19, 158–164 (2016).
Fedorenko, E. et al. Neural correlate of the construction of sentence meaning. Proc. Natl Acad. Sci. USA 113, E6256–E6262 (2016).
Nelson, M. J. et al. Neurophysiological dynamics of phrase-structure building during sentence processing. Proc. Natl Acad. Sci. USA 114, E3669–E3678 (2017).
Gibson, E. et al. How efficiency shapes human language. Trends Cogn. Sci. 23, 389–407 (2019).
Liu, H. Dependency distance as a metric of language comprehension difficulty. J. Cogn. Sci. 9, 159–191 (2008).
Sanford, A. J. & Sturt, P. Depth of processing in language comprehension: not noticing the evidence. Trends Cogn. Sci. 6, 382–386 (2002).
Ferreira, F. & Patson, N. D. The ‘good enough’ approach to language comprehension. Lang. Linguist. Compass 1, 71–83 (2007).
Fedorenko, E., Ivanova, A. A. & Regev, T. I. The language network as a natural kind within the broader landscape of the human brain. Nat. Rev. Neurosci. 25, 289–312 (2024).
Zeng, A. et al. GLM-130B: an open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations (ICLR, 2023); https://openreview.net/forum?id=-Aw0rrrPUF
Touvron, H. et al. LLaMA: open and efficient foundation language models. Preprint at https://arxiv.org/abs/2302.13971 (2023).
OpenAI et al. GPT-4 Technical Report. Preprint at https://arxiv.org/abs/2303.08774 (2024).
Marcus, G. & Davis, E. Rebooting AI: Building Artificial Intelligence We Can Trust (Pantheon Books, 2019).
Barredo Arrieta, A. et al. Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020).
Hagendorff, T. Machine psychology: Investigating emergent capabilities and behavior in large language models using psychological methods. Preprint at https://arxiv.org/abs/2303.13988 (2023).
Hewitt, J. & Manning, C. D. A. Structural probe for finding syntax in word representations. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J., Doran, C. & Solorio, T.) 4129–4138 (Association for Computational Linguistics, 2019); https://doi.org/10.18653/v1/N19-1419
Kim, Y. et al. Unsupervised recurrent neural network grammars. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J., Doran, C. & Solorio, T.) 1105–1117 (Association for Computational Linguistics, 2019); https://doi.org/10.18653/v1/N19-1114
Arps, D., Samih, Y., Kallmeyer, L. & Sajjad, H. Probing for constituency structure in neural language models. In Findings of the Association for Computational Linguistics: EMNLP 2022 (eds Goldberg, Y., Kozareva, Z. & Zhang, Y.) 6738–6757 (Association for Computational Linguistics, 2022); https://doi.org/10.18653/v1/2022.findings-emnlp.502
Belinkov, Y. Probing classifiers: promises, shortcomings, and advances. Comput. Linguist. 48, 207–219 (2022).
van Dijk, B., Kouwenhoven, T., Spruit, M. & van Duijn, M. J. Large language models: the need for nuance in current debates and a pragmatic perspective on understanding. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H., Pino, J. & Bali, K.) 12641–12654 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.emnlp-main.779
Tian, Y., Song, Y., Xia, F. & Zhang, T. in Improving Constituency Parsing with Span Attention 1691–1703 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.findings-emnlp.153
Zhang, Y., Zhou, H. & Li, Z. Fast and accurate neural CRF constituency parsing. In Proc. Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 (ed. Bessiere, C.) 4046–4053 (International Joint Conferences on Artificial Intelligence Organization, 2020); https://doi.org/10.24963/ijcai.2020/560
Bender, E. M. & Koller, A. Climbing towards NLU: on meaning, form, and understanding in the age of data. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D., Chai, J., Schluter, N. & Tetreault, J.) 5185–5198 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.acl-main.463
Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (Association for Computing Machinery, 2021); https://doi.org/10.1145/3442188.3445922
Binz, M. & Schulz, E. Using cognitive psychology to understand GPT-3. Proc. Natl Acad. Sci. USA 120, e2218523120 (2023).
Wilson, M., Petty, J. & Frank, R. How abstract is linguistic generalization in large language models? Experiments with argument structure. Trans. Assoc. Comput. Linguist. 11, 1377–1395 (2023).
Introducing ChatGPT (OpenAI, 2022).
Introducing the Next Generation of Claude (Anthropic, 2024).
Team, G. et al. Gemini: a family of highly capable multimodal models. Preprint at https://arxiv.org/abs/2312.11805 (2023).
Grattafiori, A. et al. The Llama 3 herd of models. Preprint at https://arxiv.org/abs/2407.21783 (2024).
Marcus, M. P., Santorini, B. & Marcinkiewicz, M. A. Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19, 313–330 (1993).
Xue, N., Xia, F. E. I., Chiou, F.-D. & Palmer, M. The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Nat. Lang. Eng. 11, 207–238 (2005).
Tomasello, M. Do young children have adult syntactic competence? Cognition 74, 209–253 (2000).
Kidd, E., Donnelly, S. & Christiansen, M. H. Individual differences in language acquisition and processing. Trends Cogn. Sci. 22, 154–169 (2018).
Schrimpf, M. et al. The neural architecture of language: integrative modeling converges on predictive processing. Proc. Natl Acad. Sci. USA 118, e2105646118 (2021).
Goldstein, A. et al. Shared computational principles for language processing in humans and deep language models. Nat. Neurosci. 25, 369–380 (2022).
Stanojević, M., Brennan, J. R., Dunagan, D., Steedman, M. & Hale, J. T. Modeling structure-building in the brain with CCG parsing and large language models. Cogn. Sci. 47, e13312 (2023).
Lyu, B., Marslen-Wilson, W. D., Fang, Y. & Tyler, L. K. Finding structure during incremental speech comprehension. eLife 12, RP89311 (2024).
Yu, S., Gu, C., Huang, K. & Li, P. Predicting the next sentence (not word) in large language models: what model–brain alignment tells us about discourse comprehension. Sci. Adv. 10, eadn7744 (2024).
Carnie, A. Syntax: A Generative Introduction (Blackwell, 2002).
Adger, D. Core Syntax: A Minimalist Approach (Oxford Univ. Press, 2003).
Radford, A. Syntactic Theory and the Structure of English: A Minimalist Approach (Cambridge Univ. Press, 1997).
Tallerman, M. Understanding Syntax (Arnold, 2005).
Cao, S., Kitaev, N. & Klein, D. Unsupervised parsing via constituency tests. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (eds Webber, B., Cohn, T., He, Y. & Liu, Y.) 4798–4808 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.emnlp-main.389
Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems (eds Larochelle, H. et al.) vol. 33, 1877–1901 (Curran Associates, 2020).
Marvin, R. & Linzen, T. Targeted syntactic evaluation of language models. In Proc. 2018 Conference on Empirical Methods in Natural Language Processing (eds Riloff, E. et al.) 1192–1202 (Association for Computational Linguistics, 2018); https://doi.org/10.18653/v1/D18-1151
Hu, J., Gauthier, J., Qian, P., Wilcox, E. & Levy, R. A systematic assessment of syntactic generalization in neural language models. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 1725–1744 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.acl-main.158
Finlayson, M. et al. Causal analysis of syntactic agreement mechanisms in neural language models. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (eds Zong, C. et al.) 1828–1843 (Association for Computational Linguistics, 2021); https://doi.org/10.18653/v1/2021.acl-long.144
Choe, D. K. & Charniak, E. Parsing as language modeling. In Proc. 2016 Conference on Empirical Methods in Natural Language Processing (eds Su, J., Duh, K. & Carreras, X.) 2331–2336 (Association for Computational Linguistics, 2016); https://doi.org/10.18653/v1/D16-1257
Cross, J. & Huang, L. Incremental parsing with minimal features using bi-directional LSTM. In Proc. 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (eds Erk, K. & Smith, N. A.) 32–37 (Association for Computational Linguistics, 2016); https://doi.org/10.18653/v1/P16-2006
Linzen, T., Dupoux, E. & Goldberg, Y. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Trans. Assoc. Comput. Linguist. 4, 521–535 (2016).
Linzen, T. & Leonard, B. Distinct patterns of syntactic agreement errors in recurrent networks and humans. In Proc. 40th Annual Meeting of the Cognitive Science Society, CogSci 2018 690–695 (The Cognitive Science Society, 2018).
Kitaev, N. & Klein, D. Constituency parsing with a self-attentive encoder. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Gurevych, I. & Miyao, Y.) 2676–2686 (Association for Computational Linguistics, 2018); https://doi.org/10.18653/v1/P18-1249
Kuncoro, A. et al. LSTMs can learn syntax-sensitive dependencies well, but modeling structure makes them better. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Gurevych, I. & Miyao, Y.) 1426–1436 (Association for Computational Linguistics, 2018); https://doi.org/10.18653/v1/P18-1132
Michaelov, J., Arnett, C., Chang, T. & Bergen, B. Structural priming demonstrates abstract grammatical representations in multilingual language models. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H., Pino, J. & Bali, K.) 3703–3720 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.emnlp-main.227
He, L., Chen, P., Nie, E., Li, Y. & Brennan, J. R. Decoding probing: revealing internal linguistic structures in neural language models using minimal pairs. In Proc. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (eds Calzolari, N. et al.) 4488–4497 (ELRA and ICCL, 2024).
Elman, J. L. An alternative view of the mental lexicon. Trends Cogn. Sci. 8, 301–306 (2004).
Golan, T., Siegelman, M., Kriegeskorte, N. & Baldassano, C. Testing the limits of natural language models for predicting human language judgements. Nat. Mach. Intell. 5, 952–964 (2023).
Hu, J., Mahowald, K., Lupyan, G., Ivanova, A. A. & Levy, R. Language models align with human judgments on key grammatical constructions. Proc. Natl Acad. Sci. USA 121, e2400917121 (2024).
Tian, Y., Xia, F. & Song, Y. Large language models are no longer shallow parsers. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Ku, L.-W., Martins, A. & Srikumar, V.) 7131–7142 (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.acl-long.384
Mahowald, K. et al. Dissociating language and thought in large language models: a cognitive perspective. Trends Cogn. Sci. 28, 517–540 (2023).
Jawahar, G., Sagot, B. & Seddah, D. What does BERT learn about the structure of language? In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A., Traum, D. & Màrquez, L.) 3651–3657 (Association for Computational Linguistics, 2019); https://doi.org/10.18653/v1/P19-1356
Hagoort, P. The neurobiology of language beyond single-word processing. Science 366, 55–58 (2019).
Pylkkänen, L. The neural basis of combinatory syntax and semantics. Science 366, 62–66 (2019).
Fedorenko, E., Blank, I. A., Siegelman, M. & Mineroff, Z. Lack of selectivity for syntax relative to word meanings throughout the language network. Cognition 203, 104348 (2020).
Liu, W., Xiang, M. & Ding, N. Adjective scale probe: can language models encode formal semantics information? Proc. AAAI Conf. Artif. Intell. 37, 13282–13290 (2023).
Webb, T., Holyoak, K. J. & Lu, H. Emergent analogical reasoning in large language models. Nat. Hum. Behav. 7, 1526–1541 (2023).
McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference Vol. 445 (eds van der Walt, S. & Millman, J.) 51–56 (SciPy, 2010).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) 8024–8035 (Curran Associates, 2019).
He, H. & Choi, J. D. The stem cell hypothesis: dilemma behind multi-task learning with transformer encoders. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 5555–5577 (Association for Computational Linguistics, 2021); https://doi.org/10.18653/v1/2021.emnlp-main.451
Honnibal, M., Montani, I., Van Landeghem, S. & Boyd, A. spaCy: Industrial-strength natural language processing in Python. Zenodo https://doi.org/10.5281/zenodo.1212303 (2020).
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 92–96 (SciPy, 2010); https://doi.org/10.25080/Majora-92bf1922-011
Waskom, M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
de Marneffe, M.-C., MacCartney, B. & Manning, C. D. Generating typed dependency parses from phrase structure parses. In Proc. Fifth International Conference on Language Resources and Evaluation (LREC’06) (eds Calzolari, N. et al.) 449–454 (European Language Resources Association, 2006).
Schuster, S. & Manning, C. D. Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (eds Calzolari, N. et al.) 2371–2378 (European Language Resources Association, 2016).
Matsuzaki, T., Miyao, Y. & Tsujii, J. Probabilistic CFG with latent annotations. In Proc. 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) (eds Knight, K., Ng, H. T. & Oflazer, K.) 75–82 (Association for Computational Linguistics, 2005); https://doi.org/10.3115/1219840.1219850
Spivey-Knowlton, M. & Sedivy, J. C. Resolving attachment ambiguities with multiple constraints. Cognition 55, 227–267 (1995).
Traxler, M. J. A hierarchical linear modeling analysis of working memory and implicit prosody in the resolution of adjunct attachment ambiguity. J. Psycholinguist. Res. 38, 491–509 (2009).
Collins, M. & Brooks, J. Prepositional phrase attachment through a backed-off model. In Third Workshop on Very Large Corpora 27–38 (Association for Computational Linguistics, 1995).
Schütze, C. T. PP attachment and argumenthood. MIT Working Pap. Linguist. 26, 151 (1995).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR, 2019); https://openreview.net/forum?id=Bkg6RiCqY7
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019).
Gao, T., Yao, X. & Chen, D. SimCSE: simple contrastive learning of sentence embeddings. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 6894–6910 (Association for Computational Linguistics, 2021); https://doi.org/10.18653/v1/2021.emnlp-main.552
Zhang, Z. et al. MELA: multilingual evaluation of linguistic acceptability. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Ku, L.-W., Martins, A. & Srikumar, V.) 2658–2674 (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.acl-long.146
Kasami, T. An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages, Coordinated Science Laboratory Report No. R-257 (Univ. Illinois–Urbana, 1966).
Younger, D. H. Recognition and parsing of context-free languages in time n3. Inf. Control 10, 189–208 (1967).
Hope, A. C. A. A simplified Monte Carlo significance test procedure. J. R. Stat. Soc. Ser. B 30, 582–598 (1968).
Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap (Chapman and Hall/CRC, 1994).The color of these bars seems incorrect. In the version we submitted, the bars are grey (see Fig. 3b for examples). The black bars are overlapping the data points.
Acknowledgements
We thank S. Wang for discussions, L. Jin and X. Pan for helping to construct ambiguous sentences and M. Wolpert and F. H. Wang for comments on earlier versions of the manuscript. This work was supported by the National Science and Technology Innovation 2030 Major Project 2021ZD0204100 (2021ZD0204105 to W.L. and N.D.) and the Fundamental Research Funds for the Central Universities (226-2025-00035 to N.D.).
Author information
Authors and Affiliations
Contributions
N.D. conceived the study. W.L., M.X. and N.D. designed the experiment. W.L. implemented and conducted the experiments. W.L. analysed the data. W.L., M.X. and N.D. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Human Behaviour thanks Jonathan Brennan, Taro Watanabe and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–15, Supplementary Tables 1–3, Supplementary Results, Supplementary Discussion, lists of parallel sentences, meaningless sentences, syntactically ambiguous sentences, and demonstration sentences for syntactically ambiguous sentences, as well as the instructions and prompts for human participants and LLMs.
Supplementary Data 1
Statistical source data for the Supplementary Information.
Source data
Source Data Figs. 1–4 and 6
Statistical source data for the main text and figures.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, W., Xiang, M. & Ding, N. Active use of latent tree-structured sentence representation in humans and large language models. Nat Hum Behav (2025). https://doi.org/10.1038/s41562-025-02297-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41562-025-02297-0