Abstract
We propose a mechanistic explanation of how working memories are built and reconstructed from the latent representations of visual knowledge. The proposed model features a variational autoencoder with an architecture that corresponds broadly to the human visual system and an activation-based binding pool of neurons that links latent space activities to tokenized representations. The simulation results revealed that new pictures of familiar types of items can be encoded and retrieved efficiently from higher levels of the visual hierarchy, whereas truly novel patterns are better stored using only early layers. Moreover, a given stimulus in working memory can have multiple codes, which allows representation of visual detail in addition to categorical information. Finally, we validated our model’s assumptions by testing a series of predictions against behavioural results obtained from working memory tasks. The model provides a demonstration of how visual knowledge yields compact visual representation for efficient memory encoding.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The datasets from the behavioural experiments that were analysed in this study are publicly available on the Open Science Framework (https://osf.io/tpzqk/). The datasets that were analysed and generated the simulations for the model can be found on GitHub (https://github.com/Shekoo93/MLR). Source data are provided with this paper.
Code availability
The codes for the behavioural experiments, running the paradigm and analysing the data are publicly available on the Open Science Framework (https://osf.io/tpzqk/). All the code for the MLR model, including the simulations that generated the figures and the analysis presented in the tables, is provided on GitHub (https://github.com/Shekoo93/MLR).
References
Miller, G. A. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63, 81–97 (1956).
Baddeley, A. Working memory. Science 255, 556–559 (1992).
Cowan, N. Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information-processing system. Psychol. Bull. 104, 163–191 (1988).
Cowan, N. Short-term memory based on activated long-term memory: a review in response to Norris (2017). Psychol. Bull. 145, 822–847 (2019).
Ericsson, K. A. & Kintsch, W. Long-term working memory. Psychol. Rev. 102, 211–245 (1995).
Norris, D. Short-term memory and long-term memory are still different. Psychol. Bull. 143, 992–1009 (2017).
Oberauer, K. Design for a working memory. Psychol. Learn. Motiv. 51, 45–100 (2009).
Cowan, N. in Models of Working Memory: Mechanisms of Active Maintenance and Executive Control (eds Miyake, A. & Shah, P.) 62–101 (Cambridge Univ. Press, 1999).
Brady, T. F., Konkle, T. & Alvarez, G. A. Compression in visual working memory: using statistical regularities to form more efficient memory representations. J. Exp. Psychol. Gen. 138, 487–502 (2009).
Alvarez, G. A. & Cavanagh, P. The capacity of visual short-term memory is set both by visual information load and by number of objects. Psychol. Sci. 15, 106–111 (2004).
Chen, Z. & Cowan, N. Chunk limits and length limits in immediate recall: a reconciliation. J. Exp. Psychol. Learn. Mem. Cogn. 31, 1235–1249 (2005).
Hulme, C., Maughan, S. & Brown, G. D. Memory for familiar and unfamiliar words: evidence for a long-term memory contribution to short-term memory span. J. Mem. Lang. 30, 685–701 (1991).
Ngiam, W. X., Brissenden, J. A. & Awh, E. “Memory compression” effects in visual working memory are contingent on explicit long-term memory. J. Exp. Psychol. Gen. 148, 1373–1385 (2019).
Ngiam, W. X., Khaw, K. L., Holcombe, A. O. & Goodbourn, P. T. Visual working memory for letters varies with familiarity but not complexity. J. Exp. Psychol. Learn. Mem. Cogn. 45, 1761–1775 (2019).
Yu, B. et al. STM capacity for Chinese and English language materials. Mem. Cogn. 13, 202–207 (1985).
Zhang, G. & Simon, H. A. STM capacity for Chinese words and idioms: chunking and acoustical loop hypotheses. Mem. Cogn. 13, 193–201 (1985).
Zimmer, H. D. & Fischer, B. Visual working memory of Chinese characters and expertise: the expert’s memory advantage is based on long-term knowledge of visual word forms. Front. Psychol. 11, 516 (2020).
Brady, T. F., Störmer, V. S. & Alvarez, G. A. Working memory is not fixed-capacity: more active storage capacity for real-world objects than for simple stimuli. Proc. Natl Acad. Sci. USA 113, 7459–7464 (2016).
Hulme, C., Stuart, G., Brown, G. D. & Morin, C. High- and low-frequency words are recalled equally well in alternating lists: evidence for associative effects in serial recall. J. Mem. Lang. 49, 500–518 (2003).
Atkinson, R. C. & Shiffrin, R. M. in Psychology of Learning and Motivation, Vol. 2 (eds Spence, K. W. & Spence, J. T.) 89–195 (Elsevier, 1968).
Baddeley, A. D. & Hitch, G. in Psychology of Learning and Motivation, Vol. 8 (eds Spence, K. W. & Spence, J. T.) 47–89 (Elsevier, 1974).
Baddeley, A. The episodic buffer: a new component of working memory? Trends Cogn. Sci. 4, 417–423 (2000).
Cowan, N. The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav. Brain Sci. 24, 87–114 (2001).
Guest, O. & Martin, A. E. How computational modeling can force theory building in psychological science. Perspect. Psychol. Sci. 16, 789–802 (2021).
Clarke, J. L., Clarke, B. & Yu, C.-W. Prediction in M-complete problems with limited sample size. Bayesian Anal. 8, 647–690 (2013).
Swan, G. & Wyble, B. The binding pool: a model of shared neural resources for distinct items in visual working memory. Atten. Percept. Psychophys. 76, 2136–2157 (2014).
Lake, B., Salakhutdinov, R., Gross, J. & Tenenbaum, J. One shot learning of simple visual concepts. In Proc. 33th Annual Meeting of the Cognitive Science Society (CogSci 2011) 2568–2573 (2011).
Bainbridge, W. A., Hall, E. H. & Baker, C. I. Drawings of real-world scenes during free recall reveal detailed object and spatial information in memory. Nat. Commun. 10, 5 (2019).
Potter, M. C. & Faulconer, B. A. Time to understand pictures and words. Nature 253, 437–438 (1975).
Chen, H. & Wyble, B. Amnesia for object attributes: failure to report attended information that had just reached conscious awareness. Psychol. Sci. 26, 203–210 (2015).
Gorgoraptis, N., Catalao, R. F., Bays, P. M. & Husain, M. Dynamic updating of working memory resources for visual objects. J. Neurosci. 31, 8502–8511 (2011).
Wilken, P. & Ma, W. J. A detection theory account of change detection. J. Vis. 4, 1120–1135 (2004).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In Proc. International Conference on Learning Representations (ICLR) (2014).
LeCun, Y., Cortes, C. & Burges, C. J. C. The MNIST Database of Handwritten Digits (1998); http://yann.lecun.com/exdb/mnist/
Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at http://arxiv.org/abs/1708.07747 (2017).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Cohen, M. A., Konkle, T., Rhee, J. Y., Nakayama, K. & Alvarez, G. A. Processing multiple visual objects is limited by overlap in neural channels. Proc. Natl Acad. Sci. USA 111, 8955–8960 (2014).
Konkle, T. & Caramazza, A. Tripartite organization of the ventral stream by animacy and object size. J. Neurosci. 33, 10235–10242 (2013).
Swan, G., Collins, J. & Wyble, B. Memory for a single object has differently variable precisions for relevant and irrelevant features. J. Vis. 16, 32 (2016).
Ager, S. Omniglot—the Online Encyclopedia of Writing Systems and Languages (accessed 11 November 2020); https://omniglot.com/
Kanwisher, N. Repetition blindness and illusory conjunctions: errors in binding visual types with visual tokens. J. Exp. Psychol. Hum. Percept. Perform. 17, 404–421 (1991).
Mozer, M. C. Types and tokens in visual letter perception. J. Exp. Psychol. Hum. Percept. Perform. 15, 287–303 (1989).
Bowman, H. & Wyble, B. The simultaneous type, serial token model of temporal attention and working memory. Psychol. Rev. 114, 38–70 (2007).
Huang, J. & Sekuler, R. Distortions in recall from visual memory: two classes of attractors at work. J. Vis. 10, 24 (2010).
Bays, P. M., Catalao, R. F. & Husain, M. The precision of visual working memory is set by allocation of a shared resource. J. Vis. 9, 7 (2009).
Potter, M. C., Valian, V. V. & Faulconer, B. A. Representation of a sentence and its pragmatic implications: verbal, imagistic, or abstract? J. Verbal Learn. Verbal Behav. 16, 1–12 (1977).
Potter, M. C. in On Concepts, Modules, and Language: Cognitive Science at Its Core (eds de Almeida, R. G. & Gleitman, L. R.) 239–248 (Oxford Univ. Press, 2018).
Bae, G.-Y., Olkkonen, M., Allred, S. R. & Flombaum, J. I. Why some colors appear more memorable than others: a model combining categories and particulars in color working memory. J. Exp. Psychol. Gen. 144, 744–763 (2015).
Bullier, J. Integrated model of visual processing. Brain Res. Rev. 36, 96–107 (2001).
Lamme, V. A., Super, H. & Spekreijse, H. Feedforward, horizontal, and feedback processing in the visual cortex. Curr. Opin. Neurobiol. 8, 529–535 (1998).
van de Ven, G. M., Siegelmann, H. T. & Tolias, A. S. Brain-inspired replay for continual learning with artificial neural networks. Nat. Commun. 11, 4069 (2020).
Barrouillet, P., Gavens, N., Vergauwe, E., Gaillard, V. & Camos, V. Working memory span development: a time-based resource-sharing model account. Dev. Psychol. 45, 477–490 (2009).
Logie, R., Camos, V. & Cowan, N. Working Memory: the State of the Science (Oxford Univ. Press, 2020).
Oberauer, K. et al. Benchmarks for models of short-term and working memory. Psychol. Bull. 144, 885–958 (2018).
Schneegans, S. & Bays, P. M. New perspectives on binding in visual working memory. Br. J. Psychol. 110, 207–244 (2019).
Bates, C. J. & Jacobs, R. A. Efficient data compression in perception and perceptual memory. Psychol. Rev. 127, 891–917 (2020).
Norris, D. & Kalm, K. Chunking and data compression in verbal short-term memory. Cognition 208, 104534 (2021).
Chen, H. et al. Does attribute amnesia occur with the presentation of complex, meaningful stimuli? The answer is, “it depends”. Mem. Cogn. 47, 1133–1144 (2019).
Thomson, A. M. Neocortical layer 6, a review. Front. Neuroanat. 4, 13 (2010).
Plate, T. A. Holographic reduced representations. IEEE Trans. Neural Netw. 6, 623–641 (1995).
Cavanagh, J. P. Holographic Processes Realizable in the Neural Realm: Prediction of Short Term Memory Performance. PhD dissertation, Carnegie Mellon Univ. (1972).
Marr, D. Early processing of visual information. Phil. Trans. R. Soc. Lond. B 275, 483–519 (1976).
Kahneman, D., Treisman, A. & Gibbs, B. J. The reviewing of object files: object-specific integration of information. Cogn. Psychol. 24, 175–219 (1992).
Rose, N. S. et al. Reactivation of latent working memories with transcranial magnetic stimulation. Science 354, 1136–1139 (2016).
Peirce, J. et al. PsychoPy2: experiments in behavior made easy. Behav. Res. Methods 51, 195–203 (2019).
Wilcox, R. R. Introduction to Robust Estimation and Hypothesis Testing (Academic Press, 2011).
Acknowledgements
We thank J. Collins, D. Kravitz, D. Pinotsis, J. Tam, C. Callahan-Flintoft and P. Doozandeh for their helpful comments during the preparation of this manuscript. This work was supported by NSF grant no. 1734220 to B.W. and Binational Science Foundation grant no. 2015299 to B.W. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
S.H. conceptualized and wrote the paper, coded the model and performed the simulations. B.W. helped with the writing and conceptualizing, as well as writing the code. R.E.O. coded and performed the behavioural experiments, analysed the behavioural data and wrote the sections on the behavioural experiments in the Results and Methods.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Human Behaviour thanks Nick Myers and Klaus Oberauer for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Examples that were used to train the model.
Colourization of MNIST and Fashion-MNIST inputs using 10 prototypical colours with independent random variations on the RGB channels. Left: images used to train the mVAE. Right: Transformed images of the same general dataset that were used to train the skip connection.
Extended Data Fig. 2 The complete MLR architecture that consists of the mVAE, binding pool, tokens, and classifiers for extracting labels (SVMss and SVMcc).
Information flows in only one direction through the mVAE but can flow bidirectionally between the latent representations and the binding pool. Tokens are used to differentiate individual items. Note that three tokens are shown here but there is no limit to the number of tokens that can be allocated.
Extended Data Fig. 3 Reconstructions from the mVAE.
Information from just one map is shown by setting the activations of the other map to 0. Both maps together produce a combined representation of shape and colour, showing that the model is able to merge the two forms of information that are disentangled across the two maps. The model only processes one item at a time in these simulations, and these are combined into single figures for ease of visualization.
Extended Data Fig. 4 Retrieval Accuracy of labels for categorical information.
Accuracy of retrieved labels for set sizes 1–8 when only shape/colour labels are stored with no visual information. The bars represent standard errors over ten independently trained models such that each model’s accuracy was the average of 10,000 repetitions.
Extended Data Fig. 5 A diagram showing the flow of information during binding.
The MLR stores two coloured MNIST digits sequentially (step 1 and step 2) in the BP. A grey scale shape cue is used to probe and retrieve the corresponding token (step 3). The resulting token is used to retrieve the shape and colour of the cued input (step 4). The MNIST digits shown in this figure are not the result of direct simulation, but are just examples to show how binding process occurs.
Extended Data Fig. 6 Mean cross correlation between inputs and retrieved images for familiar vs novel items.
Mean cross-correlation of pixel values for 500 repetitions between input and retrieved images of 10 trained models for familiar (blue) and novel (Bengali, orange) shapes across different set sizes. Blue dots indicate the correlation for familiar stimuli when the shape/colour maps were stored in the binding pool and retrieved via the mVAE feedback pathway. Orange dots indicate the correlation of novel stimuli when the L1 latent was stored and then reconstructed with the skip connection. Note that the reconstruction quality is lower for novel shapes and also that novel reconstructions deteriorate more rapidly as set size increases. The bars stand for standard errors.
Supplementary information
Supplementary Information
Supplementary Table 1.
Source data
Source Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Rights and permissions
About this article
Cite this article
Hedayati, S., O’Donnell, R.E. & Wyble, B. A model of working memory for latent representations. Nat Hum Behav 6, 709–719 (2022). https://doi.org/10.1038/s41562-021-01264-9
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41562-021-01264-9
This article is cited by
-
Adaptive compression as a unifying framework for episodic and semantic memory
Nature Reviews Psychology (2025)
-
Comprehensive exploration of visual working memory mechanisms using large-scale behavioral experiment
Nature Communications (2025)
-
A generative model of memory construction and consolidation
Nature Human Behaviour (2024)
-
Representation and computation in visual working memory
Nature Human Behaviour (2024)
-
Scaling models of visual working memory to natural images
Communications Psychology (2024)


