Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The memorability of voices is predictable and consistent across listeners

Abstract

Memorability, the likelihood that a stimulus is remembered, is an intrinsic stimulus property that is highly consistent across people—participants tend to remember or forget the same faces, objects and more. However, these consistencies in memory have thus far only been observed for visual stimuli. Here we investigated memorability in the auditory domain, collecting recognition memory scores from over 3,000 participants listening to a sequence of speakers saying the same sentence. We found significant consistency across participants in their memory for voice clips and for speakers across different utterances. Regression models incorporating both low-level (for example, fundamental frequency) and high-level (for example, dialect) voice properties were significantly predictive of memorability and generalized out of sample, supporting an inherent memorability of speakers’ voices. These results provide strong evidence that listeners are similar in the voices they remember, which can be reliably predicted by quantifiable low-level acoustic features.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Continuous recognition memory task.
Fig. 2: Consistency of voice clip memorability scores.
Fig. 3: Correlation of memorability scores across experiments.
Fig. 4: Correlation matrix of Sentence 1 voice features.

Similar content being viewed by others

Data availability

All data analysed in this study are available via the Open Science Framework at https://osf.io/pybwd/ (ref. 63). The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus used in this study is available for download at https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3.

Code availability

All experiment code is available via the Open Science Framework at https://osf.io/pybwd/ (ref. 63).

References

  1. Bainbridge, W. A., Isola, P. & Oliva, A. The intrinsic memorability of face photographs. J. Exp. Psychol. Gen. 142, 1323–1334 (2013).

    Article  PubMed  Google Scholar 

  2. Isola, P., Xiao, J., Torralba, A. & Oliva, A. What makes an image memorable? In 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 145–152 (IEEE, 2011).

  3. Kahana, M. J., Aggarwal, E. V. & Phan, T. D. The variability puzzle in human memory. J. Exp. Psychol. Learn. Mem. Cogn. 44, 1857–1863 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Wakeland-Hart, C. D., Cao, S. A., deBettencourt, M. T., Bainbridge, W. A. & Rosenberg, M. D. Predicting visual memory across images and within individuals. Cognition 227, 105201 (2022).

    Article  PubMed  Google Scholar 

  5. Antony, J. W. et al. Semantic relatedness retroactively boosts memory and promotes memory interdependence across episodes. Elife 11, e72519 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Cortese, M. J., Watson, J. M., Wang, J. & Fugett, A. Relating distinctive orthographic and phonological processes to episodic memory performance. Mem. Cognit. 32, 632–639 (2004).

    Article  PubMed  Google Scholar 

  7. Davis, T. M. & Bainbridge, W. A. Memory for artwork is predictable. Proc. Natl Acad. Sci. USA 120, e2302389120 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Needell, C. D. & Bainbridge, W. A. Embracing new techniques in deep learning for estimating image memorability. Comput. Brain Behav. 5, 168–184 (2022).

    Article  Google Scholar 

  9. Isola, P., Xiao, J., Parikh, D., Torralba, A. & Oliva, A. What makes a photograph memorable? IEEE Trans. Pattern Anal. Mach. Intell. 36, 1469–1482 (2014).

    Article  PubMed  Google Scholar 

  10. Kramer, M. A., Hebart, M. N., Baker, C. I. & Bainbridge, W. A. The features underlying the memorability of objects. Sci. Adv. 9, eadd2981 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I. & Zaghloul, K. A. Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nat. Hum. Behav. 4, 937–948 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Borkin, M. A. et al. What makes a visualization memorable? IEEE Trans. Visual Comput. Graphics 19, 2306–2315 (2013).

    Article  Google Scholar 

  13. Ongchoco, J. D. K., Chun, M. M. & Bainbridge, W. A. What moves us? The intrinsic memorability of dance. J. Exp. Psychol. Learn. Mem. Cogn. 49, 889-899 (2023).

    Article  PubMed  Google Scholar 

  14. Clapp, W., Vaughn, C. & Sumner, M. The episodic encoding of talker voice attributes across diverse voices. J. Mem. Lang. 128, 104376 (2023).

    Article  Google Scholar 

  15. Palmeri, T. J., Goldinger, S. D. & Pisoni, D. B. Episodic encoding of voice attributes and recognition memory for spoken words. J. Exp. Psychol. Learn. Mem. Cogn. 19, 309–328 (1993).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Belin, P., Fecteau, S. & Bedard, C. Thinking the voice: neural correlates of voice perception. Trends Cogn. Sci. 8, 129–135 (2004).

    Article  PubMed  Google Scholar 

  17. Young, A. W., Frühholz, S. & Schweinberger, S. R. Face and voice perception: understanding commonalities and differences. Trends Cogn. Sci. 24, 398–410 (2020).

    Article  PubMed  Google Scholar 

  18. Cleary, A. M., Winfield, M. M. & Kostic, B. Auditory recognition without identification. Mem. Cognit. 35, 1869–1877 (2007).

    Article  PubMed  Google Scholar 

  19. Kostic, B. & Cleary, A. M. Song recognition without identification: when people cannot ‘name that tune’ but can recognize it as familiar. J. Exp. Psychol. Gen. 138, 146–159 (2009).

    Article  PubMed  Google Scholar 

  20. Bainbridge, W. A. The memorability of people: intrinsic memorability across transformations of a person’s face. J. Exp. Psychol. Learn. Mem. Cogn. 43, 706–716 (2017).

    Article  PubMed  Google Scholar 

  21. McAleer, P., Todorov, A. & Belin, P. How do you say ‘Hello’? Personality impressions from brief novel voices. PLoS ONE 9, e90779 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Mileva, M. & Lavan, N. Trait impressions from voices are formed rapidly within 400 ms of exposure. J. Exp. Psychol. Gen. 152, 1539–1550 (2023).

    Article  PubMed  Google Scholar 

  23. Todorov, A., Said, C. P., Engell, A. D. & Oosterhof, N. N. Understanding evaluation of faces on social dimensions. Trends Cogn. Sci. 12, 455–460 (2008).

    Article  PubMed  Google Scholar 

  24. Tompkinson, J., Mileva, M., Watt, D. & Mike Burton, A. Perception of threat and intent to harm from vocal and facial cues. Q. J. Exp. Psychol. 77, 326–342 (2023).

  25. Brady, T. F., Konkle, T., Alvarez, G. A. & Oliva, A. Visual long-term memory has a massive storage capacity for object details. Proc. Natl Acad. Sci. USA 105, 14325–14329 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Standing, L. Learning 10000 pictures. Q. J. Exp. Psychol. 25, 207–222 (1973).

    Article  CAS  PubMed  Google Scholar 

  27. Bigelow, J. & Poremba, A. Achilles’ ear? Inferior human short-term and recognition memory in the auditory modality. PLoS ONE 9, e89914 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Cohen, M. A., Horowitz, T. S. & Wolfe, J. M. Auditory recognition memory is inferior to visual recognition memory. Proc. Natl Acad. Sci. USA 106, 6008–6010 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Fritz, J., Mishkin, M. & Saunders, R. C. In search of an auditory engram. Proc. Natl Acad. Sci. USA 102, 9359–9364 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Clifford, B. R. Voice identification by human listeners: on earwitness reliability. Law Hum. Behav. 4, 373–394 (1980).

    Article  Google Scholar 

  31. Pautz, N. et al. Time to reflect on voice parades: the influence of reflection and retention interval duration on earwitness performance. Appl. Cogn. Psychol. 38, e4162 (2024).

    Article  Google Scholar 

  32. Yarmey, A. D., Yarmey, A. L. & Yarmey, M. J. Face and voice identifications in showups and lineups. Appl. Cogn. Psychol. 8, 453–464 (1994).

    Article  Google Scholar 

  33. Pazdera, J. K. & Kahana, M. J. Modality effects in free recall: a retrieved-context account. J. Exp. Psychol. Learn. Mem. Cogn. 49, 866–888 (2023).

    Article  PubMed  Google Scholar 

  34. Smith, R. E. & Hunt, R. R. Presentation modality affects false memory. Psychon. Bull. Rev. 5, 710–715 (1998).

    Article  Google Scholar 

  35. Munoz-Lopez, M. M., Mohedano-Moriano, A. & Insausti, R. Anatomical pathways for auditory memory in primates. Front. Neuroanat. 4, 129 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Peters, J., Suchan, B., Köster, O. & Daum, I. Domain‐specific retrieval of source information in the medial temporal lobe. Eur. J. Neurosci. 26, 1333–1343 (2007).

    Article  PubMed  Google Scholar 

  37. Bradshaw, A. R. & McGettigan, C. Instrumental learning in social interactions: trait learning from faces and voices. Q. J. Exp. Psychol. 74, 1344–1359 (2021).

    Article  Google Scholar 

  38. Goldinger, S. D. Echoes of echoes? An episodic theory of lexical access. Psychol. Rev. 105, 251–279 (1998).

    Article  PubMed  Google Scholar 

  39. Magnuson, J. S., Nusbaum, H. C., Akahane-Yamada, R. & Saltzman, D. Talker familiarity and the accommodation of talker variability. Atten. Percept. Psychophys. 83, 1842–1860 (2021).

    Article  PubMed  Google Scholar 

  40. Magnuson, J. S. & Nusbaum, H. C. Acoustic differences, listener expectations, and the perceptual accommodation of talker variability. J. Exp. Psychol. Hum. Percept. Perform. 33, 391–409 (2007).

    Article  PubMed  Google Scholar 

  41. Zhang, C. & Chen, S. Toward an integrative model of talker normalization. J. Exp. Psychol. Hum. Percept. Perform. 42, 1252–1268 (2016).

    Article  PubMed  Google Scholar 

  42. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G. & Pallett, D. S. DARPA TIMIT Acoustic-phonetic Continuous Speech Corpus (US Department of Commerce, 1993).

  43. Shue, Y. L., Keating, P., Vicenik, C. & Yu, K. VoiceSauce: a program for voice analysis. In Proc. ICPhS XVII, 1846–1849 (ICPhS, 2011).

  44. Oosterhof, N. N. & Todorov, A. The functional basis of face evaluation. Proc. Natl Acad. Sci. USA 105, 11087–11092 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Vokey, J. R. & Read, J. D. Familiarity, memorability, and the effect of typicality on the recognition of faces. Mem. Cognit. 20, 291–302 (1992).

    Article  CAS  PubMed  Google Scholar 

  46. Bainbridge, W. A. & Rissman, J. Dissociating neural markers of stimulus memorability and subjective recognition during episodic retrieval. Sci. Rep. 8, 8679 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Johnsrude, I. S. et al. Swinging at a cocktail party: voice familiarity aids speech perception in the presence of a competing voice. Psychol. Sci. 24, 1995–2004 (2013).

    Article  PubMed  Google Scholar 

  48. Nygaard, L. C., Sommers, M. S. & Pisoni, D. B. Speech perception as a talker-contingent process. Psychol. Sci. 5, 42–46 (1994).

    Article  PubMed  Google Scholar 

  49. Bishop, J. & Keating, P. Perception of pitch location within a speaker’s range: fundamental frequency, voice quality and speaker sex. J. Acoust. Soc. Am. 132, 1100–1112 (2012).

    Article  PubMed  Google Scholar 

  50. Busso, C., Lee, S. & Narayanan, S. Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans. Audio Speech Lang. Process. 17, 582–596 (2009).

    Article  Google Scholar 

  51. Baumann, O. & Belin, P. Perceptual scaling of voice identity: common dimensions for different vowels and speakers. Psychol. Res. 74, 110–120 (2010).

    Article  PubMed  Google Scholar 

  52. Zhang, C., van de Weijer, J. & Cui, J. Intra-and inter-speaker variations of formant pattern for lateral syllables in standard Chinese. Forensic Sci. Int. 158, 117–124 (2006).

    Article  PubMed  Google Scholar 

  53. Zhou, X. et al. A magnetic resonance imaging-based articulatory and acoustic study of ‘retroflex’ and ‘bunched’ American English/r/. J. Acoust. Soc. Am. 123, 4466–4481 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Syrdal, A. K. & Gopal, H. S. A perceptual model of vowel recognition based on the auditory representation of American English vowels. J. Acoust. Soc. Am. 79, 1086–1100 (1986).

    Article  CAS  PubMed  Google Scholar 

  55. Jacewicz, E., Fox, R. A. & Wei, L. Between-speaker and within-speaker variation in speech tempo of American English. J. Acoust. Soc. Am. 128, 839–850 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Schweinberger, S. R., Kawahara, H., Simpson, A. P., Skuk, V. G. & Zäske, R. Speaker perception. Wiley Interdiscip. Rev. Cogn. Sci. 5, 15–25 (2014).

    Article  PubMed  Google Scholar 

  57. Van Lancker, D., Kreiman, J. & Emmorey, K. Familiar voice recognition: patterns and parameters part I: recognition of backward voices. J. Phon. 13, 19–38 (1985).

    Article  Google Scholar 

  58. Szendro, P., Vincze, G. & Szasz, A. Pink-noise behaviour of biosystems. Eur. Biophys. J. 30, 227–231 (2001).

    Article  CAS  PubMed  Google Scholar 

  59. Kawahara, H., Cheveigne, A. D. & Patterson, R. D. An instantaneous-frequency-based pitch extraction method for high-quality speech transformation: revised tempo in the straight suite. In Fifth International Conference on Spoken Language Processing 0659 (ISCA, 1998).

  60. Sjölander, K. The Snack Sound Toolkit. https://www.speech.kth.se/snack/ (KTH, 2004).

  61. Boersma, P. & Weenink, D. Praat: Doing Phonetics by Computer. Version 6.3.18. http://www.praat.org/ (2023).

  62. Sun, X. Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing Vol. 1 I-333 (IEEE, 2002).

  63. Revsine, C., Goldberg, E. & Bainbridge, W. A. Characterizing the intrinsic memorability of voices. OSF https://osf.io/pybwd/ (2025).

Download references

Acknowledgements

This research was supported by the National Science Foundation under Grant No. 2329776 awarded to W.A.B., the NSF Graduate Research Fellowship (Grant No. 1746045) awarded to C.R., and the University of Chicago Quad Undergraduate Research Scholarship awarded to E.G. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank K. Van Engen, H. Nusbaum and M. Berman for their insightful feedback on the analyses and the work as a whole.

Author information

Authors and Affiliations

Authors

Contributions

C.R. collected the data, analysed the datasets and drafted the manuscript. W.A.B. designed the research, supervised analyses and edited the manuscript. E.G. assisted in data collection and analysed the Experiment 3 data.

Corresponding author

Correspondence to Cambria Revsine.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Abbie Bradshaw and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–3 and Tables 1–3.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Revsine, C., Goldberg, E. & Bainbridge, W.A. The memorability of voices is predictable and consistent across listeners. Nat Hum Behav 9, 758–768 (2025). https://doi.org/10.1038/s41562-025-02112-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41562-025-02112-w

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing