Abstract
Humans make over a hundred thousand eye movements daily to gather visual information. But what determines where we look? Current computational models typically link gaze behaviour to visual features of isolated images, but we know that eye movements are also strongly shaped by cognitive goals: Observers gather information that helps them to understand, rather than just represent, the world. Within this framework, observers should focus more on information that updates one’s understanding of the environment, and less on what is purely visually salient. Here we tested this hypothesis using a free-viewing paradigm of narratives where we experimentally manipulated the meaningfulness of temporal context by either presenting pictures in a coherent, i.e. correct, order, or in a temporally shuffled order. We developed a computational approach to quantify which visual information is semantically salient (i.e., important for understanding): we separately obtained language narratives for images in stories, and computed the contextual surprisal of visual objects using a large language model. The ability of this semantic saliency model in explaining gaze behaviour was compared to a state-of-the art model of visual saliency (DeepGaze-II). We found that individuals (N = 42) looked relatively more often and more quickly at semantically salient objects when images were presented in coherent compared to shuffled order. In contrast, visual salience did not better account for gaze behaviour in coherent than shuffled order. These findings highlight how internal contextual models guide visual sampling and demonstrate that language models could offer a powerful tool for capturing gaze behavior in richer, meaningful settings.
Similar content being viewed by others
Data availability
All data related to the current manuscript are openly available (upon user registration) on a Radboud Data Repository Site (Eye movements during narrative viewing; https://doi.org/10.34973/014c-t415).
Code availability
All code related to the current manuscript are openly available (upon user registration) on a Radboud Data Repository Site (Eye movements during narrative viewing; https://doi.org/10.34973/014c-t415).
References
Trucco, E. Active Vision. AI Commun. 6, 242–244 (1993).
Findlay, J. M. Background to Active Vision. In Active Vision: The Psychology of Looking and Seeing (eds Findlay, J. M. & Gilchrist, I. D.) 0 (Oxford University Press, 2003). https://doi.org/10.1093/acprof:oso/9780198524793.003.0002.
Itti, L. & Baldi, P. A principled approach to detecting surprising events in video. in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) vol. 1 631–637 vol. 1 (2005).
Kümmerer, M., Wallis, T. S. A. & Bethge, M. DeepGaze II: Reading fixations from deep features trained on object recognition. Preprint at https://doi.org/10.48550/arXiv.1610.01563 (2016).
Henderson, J. M. & Hayes, T. R. Meaning-based guidance of attention in scenes as revealed by meaning maps. Nat. Hum. Behav. 1, 743–747 (2017).
Harel, J., Koch, C. & Perona, P. Graph-Based Visual Saliency. in Advances in Neural Information Processing Systems vol. 19 (MIT Press, 2006).
Yarbus, A. L. Eye Movements During Perception of Complex Objects. in Eye Movements and Vision (ed. Yarbus, A. L.) 171–211 (Springer US, Boston, MA, 1967). https://doi.org/10.1007/978-1-4899-5379-7_8.
Hayhoe, M. & Ballard, D. Eye movements in natural behavior. Trends Cogn. Sci. 9, 188–194 (2005).
Võ, M. L.-H. & Henderson, J. M. The time course of initial scene processing for eye movement guidance in natural scene search. J. Vis. 10, 14 (2010).
Friston, K. The free-energy principle: a unified brain theory?. Nat. Rev. Neurosci. 11, 127–138 (2010).
Berlyne, D. E. Curiosity and Exploration. Science 153, 25–33 (1966).
Bromberg-Martin, E. S. & Hikosaka, O. Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards. Neuron 63, 119–126 (2009).
Gottlieb, J. Attention, Learning, and the Value of Information. Neuron 76, 281–295 (2012).
Gottlieb, J. & Oudeyer, P.-Y. Towards a neuroscience of active sampling and curiosity. Nat. Rev. Neurosci. 19, 758–770 (2018).
Biederman, I., Mezzanotte, R. J. & Rabinowitz, J. C. Scene perception: Detecting and judging objects undergoing relational violations. Cogn. Psychol. 14, 143–177 (1982).
Tatler, B. W., Hayhoe, M. M., Land, M. F. & Ballard, D. H. Eye guidance in natural vision: Reinterpreting salience. J. Vis. 11, 5–5 (2011).
Loschky, L. C., Larson, A. M., Smith, T. J. & Magliano, J. P. The Scene Perception & Event Comprehension Theory (SPECT) Applied to Visual Narratives. Top. Cogn. Sci. 12, 311–351 (2020).
Foulsham, T., Wybrow, D. & Cohn, N. Reading Without Words: Eye Movements in the Comprehension of Comic Strips. Appl. Cogn. Psychol. 30, 566–579 (2016).
Pedziwiatr, M. A., Heer, S., Coutrot, A., Bex, P. & Mareschal, I. Prior knowledge about events depicted in scenes decreases oculomotor exploration. Cognition 238, 105544 (2023).
Hutson, J. P., Magliano, J. P. & Loschky, L. C. Understanding Moment-to-Moment Processing of Visual Narratives. Cogn. Sci. 42, 2999–3033 (2018).
Hutson, J. P., Chandran, P., Magliano, J. P., Smith, T. J. & Loschky, L. C. Narrative Comprehension Guides Eye Movements in the Absence of Motion. Cogn. Sci. 46, e13131 (2022).
Kirkorian, H. L. & Anderson, D. R. Effect of sequential video shot comprehensibility on attentional synchrony: A comparison of children and adults. Proc. Natl. Acad. Sci. 115, 9867–9874 (2018).
Hayes, T. R. & Henderson, J. M. Looking for Semantic Similarity: What a Vector-Space Model of Semantics Can Tell Us About Attention in Real-World Scenes. Psychol. Sci. 32, 1262–1270.
Peacock, C. E., Hayes, T. R. & Henderson, J. M. The role of meaning in attentional guidance during free viewing of real-world scenes. Acta Psychologica 198, 102889 (2019).
Loftus, G. R. & Mackworth, N. H. Cognitive determinants of fixation location during picture viewing. J. Exp. Psychol.: Hum. Percept. Perform. 4, 565 (1978).
Henderson, J., Philip, A. W. & Hollingworth, A. The effects of semantic consistency on eye movements during complex scene viewing. J. Exp. Psychol.: Hum. Percept. Perform. 25, 210 (1999).
Mayer, M. & Mayer, M. One Frog Too Many. (Dial Books for Young Readers, New York, 1975).
Mayer, M. A Boy, a Dog and a Frog. (New York, 1967).
Mayer, M. Frog, Where Are You? (Dial Books for Young Readers, New York, 1969).
Mayer, M. Frog on His Own. (Dial Books for Young Readers, New York, 1973).
Berman, R. A. & Slobin, D. I. Relating Events in Narrative: A Crosslinguistic Developmental Study. (Psychology Press, New York, 1994).
Magliano, J. P., Larson, A. M., Higgs, K. & Loschky, L. C. The relative roles of visuospatial and linguistic working memory systems in generating inferences during visual narrative comprehension. Mem. Cogn. 44, 207–219 (2016).
Cacioppo, J. T. & Petty, R. E. The need for cognition. J. Personal. Soc. Psychol. 42, 116–131 (1982).
Simonyan, K. & Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. Preprint at https://doi.org/10.48550/arXiv.1409.1556 (2015).
Stoll, J., Thrun, M., Nuthmann, A. & Einhäuser, W. Overt attention in natural scenes: Objects dominate features. Vis. Res. 107, 36–48 (2015).
Pedziwiatr, M. A., V. D. Hagen & Teufel, C. Knowledge-Driven Perceptual Organization Reshapes Information Sampling Via Eye Movements. 49, 408–427.
Roth, N., Rolfs, M., Hellwich, O. & Obermayer, K. Objects guide human gaze behavior in dynamic real-world scenes. PLoS Comput. Biol. 19, e1011512 (2023).
Nuthmann, A., Schütz, I. & Einhäuser, W. Salience-based object prioritization during active viewing of naturalistic scenes in young and older adults. Sci. Rep. 10, 22057 (2020).
Radford, A. et al. Language Models are Unsupervised Multitask Learners. (2019).
Lange, F. P., de, Heilbron, M. & Kok, P. How Do Expectations Shape Perception? Trends Cogn. Sci. 22, 764–779 (2018).
Qi, P., Zhang, Y., Zhang, Y., Bolton, J. & Manning, C. D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. arXiv.org https://arxiv.org/abs/2003.07082v2 (2020).
Kümmerer, M., Wallis, T. S. A. & Bethge, M. Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics. in Computer Vision – ECCV 2018 (eds Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y.) vol. 11220 798–814 (Springer International Publishing, Cham, 2018).
Kummerer, M., Wallis, T. S. A., Gatys, L. A. & Bethge, M. Understanding Low- and High-Level Contributions to Fixation Prediction. in 2017 IEEE International Conference on Computer Vision (ICCV) 4799–4808 (IEEE, Venice, 2017). https://doi.org/10.1109/ICCV.2017.513.
Schütt, H. H., Rothkegel, L. O. M., Trukenbrod, H. A., Engbert, R. & Wichmann, F. A. Disentangling bottom-up versus top-down and low-level versus high-level influences on eye movements over time. J. Vis. 19, 1 (2019).
Cohn, N. Visual Narrative Structure. Cogn. Sci. 37, 413–452 (2013).
Gottlieb, J., Oudeyer, P.-Y., Lopes, M. & Baranes, A. Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends Cogn. Sci. 17, 585–593 (2013).
Anderson, N. C. & Donk, M. Salient object changes influence overt attentional prioritization and object-based targeting in natural scenes. PLoS ONE 12, e0172132 (2017).
Anderson, N. C., Ort, E., Kruijne, W., Meeter, M. & Donk, M. It depends on when you look at it: Salience influences eye movements in natural scene viewing and search early in time. J. Vis. 15, 9 (2015).
Võ, M. L.-H. The meaning and structure of scenes. Vis. Res. 181, 10–20 (2021).
Doerig, A. et al. High-level visual representations in the human brain are aligned with large language models. Nat. Mach. Intell. 7, 1220–1234 (2025).
Nau, M. et al. Neural and behavioral reinstatement jointly reflect retrieval of narrative events. Nat. Commun. 16, 1–15 (2025).
Shamay-Tsoory, Mendelsohn, S., Avi. Real-Life Neuroscience: An Ecological Approach to Brain and Behavior Research. https://journals.sagepub.com/doi/full/10.1177/1745691619856350 (2019).
Willems, R. M. & Peelen, M. V. How context changes the neural basis of perception and language. iScience 24, 102392 (2021).
Nau, M., Schmid, A. C., Kaplan, S. M., Baker, C. I. & Kravitz, D. J. Centering cognitive neuroscience on task demands and generalization. Nat. Neurosci. 27, 1656–1667 (2024).
Lagunas, M. & Garces, E. Transfer Learning for Illustration Classification. Spanish Computer Graphics Conference (CEIG) 9 pages Preprint at https://doi.org/10.2312/ceig.20171213 (2017).
Baldassano, C. et al. Discovering Event Structure in Continuous Narrative Perception and Memory. Neuron 95, 709–721.e5 (2017).
Cohn, N. Visual narrative comprehension: Universal or not?. Psychon. Bull. Rev. 27, 266–285 (2020).
Tikka, P., Kauttonen, J. & Hlushchuk, Y. Narrative comprehension beyond language: Common brain networks activated by a movie and its script. PLoS ONE 13, e0200134 (2018).
Whitney, P. & Budd, D. Think-aloud protocols and the study of comprehension. Discourse Process. 21, 341–351 (1996).
Melcher, D. Predictive remapping of visual features precedes saccadic eye movements. Nat. Neurosci. 10, 903–907 (2007).
Acknowledgements
E.B. was supported by European Union’s Horizon Europe research and innovation programme under the Marie Skłodowska-Curie grant agreement for individual fellowship (101106569). L.M.S. was supported by a by European Union’s Horizon Europe research and innovation programme under the Marie Skłodowska-Curie grant agreement for individual fellowship (101111402). C.H.H. was supported by the Italian Ministry of University and Research (MUR) within the PNNR NextGenerationEU program (No. 0000027). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
E.B.: conceptualization, methodology, investigation, formal analysis, visualization, writing – original draft, review and editing. L.M.S.: formal analysis (supporting), writing – review and editing. C.H.H.: methodology (supporting), formal analysis (supporting), writing – review and editing. M.V.P.: conceptualization (supporting), writing – review and editing. F.PdL.: conceptualization, formal analysis (supporting), visualization (supporting), writing – review and editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Psychology thanks Lester C. Loschky and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Troby Ka-Yan Lui. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Berlot, E., Schmitt, LM., Huber-Huber, C. et al. Narrative context shifts gaze from visual to semantic salience. Commun Psychol (2026). https://doi.org/10.1038/s44271-026-00426-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s44271-026-00426-7


