Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Interpretable deep learning reveals distinct spectral and temporal drivers of perceived musical emotion
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 05 January 2026

Interpretable deep learning reveals distinct spectral and temporal drivers of perceived musical emotion

  • Yiming Gu1 na1,
  • Chen Shao2 na1,
  • Jingze Li3 &
  • …
  • Yinghan Fan3 

Scientific Reports , Article number:  (2026) Cite this article

  • 892 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Mathematics and computing
  • Neuroscience
  • Psychology

Abstract

This study addresses a fundamental question in music psychology: which specific, dynamic acoustic features predict human listeners’ emotional responses along the dimensions of valence and arousal. Our primary objective was to develop and validate an interpretable computational model that can serve as a tool for testing and advancing theories of music cognition. Using the publicly available DEAM dataset, containing 1,802 music excerpts with continuous valence-arousal ratings, we developed a novel, theory-guided neural network. This proposed model integrates a convolutional pathway for local spectral analysis with a Transformer pathway for capturing long-range temporal dependencies. Critically, its learning process is constrained by established principles from music psychology to enhance its plausibility. A core finding from an analysis of the model’s attention mechanisms was that distinct acoustic patterns drive the two emotional dimensions: rhythmic regularity and spectral flux emerged as strong predictors of arousal, whereas harmonic complexity and musical mode were key predictors of valence. To validate our analytical tool, we confirmed that the model significantly outperformed standard baselines in predictive accuracy, achieving a Concordance Correlation Coefficient (CCC) of 0.67 for valence and 0.73 for arousal. Furthermore, an ablation study demonstrated that the theory-guided constraints were essential for this superior performance. Together, these findings provide robust computational evidence for the distinct roles of temporal and spectral features in shaping emotional perception. This work demonstrates the utility of interpretable machine learning as a powerful methodology for testing and refining psychological theories of music and emotion.

Similar content being viewed by others

Application of artificial intelligence CNN model in emotional recognition of instrumental music

Article Open access 22 December 2025

Feature centric based deep learning approach for music mood recognition with HuBERT transformer model

Article Open access 27 November 2025

Emotionally consistent music melody generation algorithm integrating prompt perception and hyper-network optimization

Article Open access 18 December 2025

Data availability

The data that support the findings of this study are openly available in the MediaEval Database for Emotional Analysis in Music (DEAM) at https://cvml.unige.ch/databases/DEAM/.

References

  1. Juslin, P. N. & Sloboda, J. Handbook of Music and Emotion: Theory, research, Applications (Oxford University Press, 2011).

  2. Koelsch, S. Brain correlates of music-evoked emotions. Nat. Rev. Neurosci. 15 (3), 170–180 (2014).

    Google Scholar 

  3. Hou, J. et al. Review on neural correlates of emotion regulation and music: implications for emotion dysregulation. Front. Psychol. 8, 501 (2017).

    Google Scholar 

  4. Chong, H. J., Kim, H. J. & Kim, B. Scoping review on the use of music for emotion regulation. Behav. Sci. 14 (9), 793 (2024).

    Google Scholar 

  5. Juslin, P. N. Musical Emotions Explained: Unlocking the Secrets of Musical Affect (Oxford University Press, 2019).

  6. Russell, J. A. A circumplex model of affect. J. Personal. Soc. Psychol. 39 (6), 1161 (1980).

    Google Scholar 

  7. Gabrielsson, A. Emotion perceived and emotion felt: same or different? Musicae Sci. 5 (1_suppl), 123–147 (2001).

    Google Scholar 

  8. Aljanaki, A., Yang, Y. H. & Soleymani, M. Developing a benchmark for emotional analysis of music. PloS One. 12 (3), e0173392. (2017).

  9. Kim, Y. E. et al. Music emotion recognition: A state of the art review. In Proc. ismir (Vol. 86, pp. 937–952). (2010), August.

  10. Han, D., Kong, Y., Han, J. & Wang, G. A survey of music emotion recognition. Front. Comput. Sci. 16 (6), 166335 (2022).

    Google Scholar 

  11. Eerola, T. & Vuoskoski, J. K. A comparison of the discrete and dimensional models of emotion in music. Psychol. Music. 39 (1), 18–49 (2011).

  12. Juslin, P. N. & Laukka, P. Communication of emotions in vocal expression and music performance: different channels, same code? Psychol. Bull. 129 (5), 770 (2003).

  13. Ilie, G. & Thompson, W. F. A comparison of acoustic cues in music and speech for three dimensions of affect. Music Percept. 23 (4), 319–330 (2006).

    Google Scholar 

  14. Schubert, E. Modeling perceived emotion with continuous musical features. Music Percept. 21 (4), 561–585 (2004).

    Google Scholar 

  15. Guest, O. & Martin, A. E. How computational modeling can force theory Building in psychological science. Perspect. Psychol. Sci. 16 (4), 789–802 (2021).

    Google Scholar 

  16. Barrett, L. F. The theory of constructed emotion: an active inference account of interoception and categorization. Soc. Cognit. Affect. Neurosci. 12 (1), 1–23 (2017).

    Google Scholar 

  17. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1 (5), 206–215 (2019).

    Google Scholar 

  18. Adadi, A. & Berrada, M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access. 6, 52138–52160 (2018).

    Google Scholar 

  19. Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J. & Müller, K. R. Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE, 109(3), 247–278. (2021).

  20. Mahmoodi, J., Leckelt, M., van Zalk, M. W., Geukes, K. & Back, M. D. Big data approaches in social and behavioral science: four key trade-offs and a call for integration. Curr. Opin. Behav. Sci. 18, 57–62 (2017).

    Google Scholar 

  21. Buhrmester, M., Kwang, T. & Gosling, S. D. Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality data? (2016).

  22. Soleymani, M., Aljanaki, A. & Yang, Y. DEAM: Mediaeval Database for Emotional Analysis in Music (Geneva, 2016).

  23. Brown, J. C. Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89 (1), 425–434 (1991).

    Google Scholar 

  24. Schörkhuber, C. & Klapuri, A. Constant-Q transform toolbox for music processing. In 7th sound and music computing conference, Barcelona, Spain (pp. 3–64). SMC. (2010), July.

  25. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE. 86 (11), 2278–2324 (2002).

    Google Scholar 

  26. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems vol. 30. 5998–6008 (2017).

  27. Karpatne, A. et al. Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29 (10), 2318–2331 (2017).

    Google Scholar 

  28. Husain, G., Thompson, W. F. & Schellenberg, E. G. Effects of musical tempo and mode on arousal, mood, and Spatial abilities. Music Percept. 20 (2), 151–171 (2002).

    Google Scholar 

  29. Parncutt, R. The emotional connotations of major versus minor tonality: one or more origins? Musicae Sci. 18 (3), 324–353 (2014).

    Google Scholar 

  30. Lawrence, I. & Lin, K. A Concordance Correlation Coefficient To Evaluate Reproducibility 255–268 (Biometrics, 1989).

  31. Eyben, F., Wöllmer, M. & Schuller, B. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia (pp. 1459–1462). (2010), October.

  32. Cummins, N. et al. An image-based deep spectrum feature representation for the recognition of emotional speech. In Proceedings of the 25th ACM international conference on Multimedia (pp. 478–484). (2017), October.

  33. Huang, C. Z. A. et al. (2018). Music transformer. arXiv preprint arXiv:1809.04281.

  34. Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618–626). (2017).

  35. McFee, B. et al. Librosa: audio and music signal analysis in python. SciPy 2015, 18–24 (2015).

    Google Scholar 

  36. Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8 (5), 393–402 (2007).

    Google Scholar 

  37. Koelsch, S. Toward a neural basis of music perception–a review and updated model. Frontier Psychol. 2, 110 (2011).

    Google Scholar 

  38. Gregory, R. L. The intelligent eye. (1970).

  39. Bar, M. Visual objects in context. Nat. Rev. Neurosci. 5 (8), 617–629 (2004).

    Google Scholar 

  40. Meyer, L. B. Emotion and Meaning in Music (University of Chicago Press, 2008).

  41. Huron, D. Sweet Anticipation: Music and the Psychology of Expectation (MIT Press, 2008).

  42. Clark, A. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36 (3), 181–204 (2013).

    Google Scholar 

  43. Koelsch, S., Vuust, P. & Friston, K. Predictive processes and the peculiar case of music. Trends Cogn. Sci. 23 (1), 63–77 (2019).

    Google Scholar 

  44. Montague, P. R., Dolan, R. J., Friston, K. J. & Dayan, P. Computational psychiatry. Trends Cogn. Sci. 16 (1), 72–80 (2012).

    Google Scholar 

  45. Pereira, C. S. et al. Music and emotions in the brain: familiarity matters. PloS One. 6 (11), e27241. (2011).

  46. Gabrielsson, A. & Lindström, E. The role of structure in the musical expression of emotions. Handbook of music and emotion: Theory, research, applications, 367400, 367 – 44. (2010).

  47. Wingstedt, J. Narrative music: towards an understanding of musical narrative functions in multimedia (Doctoral dissertation, Luleå tekniska universitet). (2005).

  48. Balkwill, L. L. & Thompson, W. F. A cross-cultural investigation of the perception of emotion in music: psychophysical and cultural cues. Music Percept. 17 (1), 43–64 (1999).

    Google Scholar 

  49. Swaminathan, S. & Schellenberg, E. G. Current emotion research in music psychology. Emot. Rev. 7 (2), 189–197 (2015).

    Google Scholar 

  50. Nickerson, R. S. Confirmation bias: A ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 2 (2), 175–220 (1998).

    Google Scholar 

Download references

Author information

Author notes
  1. Yiming Gu and Chen Shao contributed equally to this work.

Authors and Affiliations

  1. Zhenjiang College, Zhenjiang, China

    Yiming Gu

  2. Saint Petersburg Conservatory, Saint Petersburg, Russia

    Chen Shao

  3. Sichuan Agricultural University, Ya’, China

    Jingze Li & Yinghan Fan

Authors
  1. Yiming Gu
    View author publications

    Search author on:PubMed Google Scholar

  2. Chen Shao
    View author publications

    Search author on:PubMed Google Scholar

  3. Jingze Li
    View author publications

    Search author on:PubMed Google Scholar

  4. Yinghan Fan
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Y.G. and C.S. contributed equally to this work. Y.G. conceptualized the study, developed the methodology, implemented the software, and wrote the original draft. C.S. acquired the funding, performed validation of the experimental results, and contributed significantly to the writing, review, and editing of the manuscript. Y.F. assisted with data curation and visualization. J.L. provided supervision, project administration, and critically reviewed the manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Yiming Gu, Chen Shao or Jingze Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, Y., Shao, C., Li, J. et al. Interpretable deep learning reveals distinct spectral and temporal drivers of perceived musical emotion. Sci Rep (2026). https://doi.org/10.1038/s41598-025-34238-2

Download citation

  • Received: 21 August 2025

  • Accepted: 26 December 2025

  • Published: 05 January 2026

  • DOI: https://doi.org/10.1038/s41598-025-34238-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Music psychology
  • Emotion science
  • Computational modeling
  • Interpretable models
  • Arousal-Valence model
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics