Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
The Minimum Semantic Content (MSC) Dataset: A Large, Balanced Resource for Computational Aesthetics Research
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 17 February 2026

The Minimum Semantic Content (MSC) Dataset: A Large, Balanced Resource for Computational Aesthetics Research

  • Olivier Penacchio1,2,3,
  • Arslan Javed1,2,
  • Bogdan Raducanu1,2,
  • Xavier Otazu1,2 &
  • …
  • C. Alejandro Parraga  ORCID: orcid.org/0000-0002-3809-241X1,2 

Scientific Data , Article number:  (2026) Cite this article

  • 375 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computer science
  • Databases

Abstract

Image databases are central to empirical aesthetics, enabling tests of how image statistics relate to observers’ appreciation. However, many existing databases have two key limitations: (1) they conflate low-level visual features with high-level semantic content, making it difficult to separate visual from cognitive influences on aesthetic judgments; and (2) they are imbalanced, overrepresenting highly appreciated images. To address these issues, we present the Minimum Semantic Content (MSC) database, a large, systematically curated resource for computational aesthetics. It comprises 10,426 natural scenes with reduced, homogenized semantic content, minimizing cognitive and emotional confounds. Each received 100 individual aesthetic ratings from naïve observers, drawn from a pool of approximately 10,000 participants, via crowdsourcing. The database includes both “beautified” and “uglified” versions, generated with a manipulation technique that promotes uniform coverage across the aesthetic spectrum. This broader distribution mitigates bias and overfitting in models. Validation also shows improved robustness in computational models overall. This database enables researchers to study how perceptual features shape aesthetic judgments, using stimuli with very limited semantic and contextual confounds.

Similar content being viewed by others

Emotional palette: a computational mapping of aesthetic experiences evoked by visual art

Article Open access 27 August 2024

Computational reconstruction of mental representations using human behavior

Article Open access 17 May 2024

Self-reference and emotional reaction drive aesthetic judgment

Article Open access 24 August 2024

Data availability

The MSC database is openly available via the Open Science Framework (OSF)25, and can be accessed at https://doi.org/10.17605/OSF.IO/ZGSVJ. The OSF project includes two components: (1) “The MSC Database” and (2) “Validation Software”. The first component contains the complete image database as well as a comprehensive.csv file with the raw rating results. This arrangement allows users to easily determine the type and provenance of each image and provides access to detailed metadata and aesthetic ratings, supporting a wide range of analyses and applications. All images included in the MSC database are either in the public domain or released under licenses permitting reuse and redistribution.

Code availability

All QIP metrics were computed using the Python-based QIP toolbox of Redies et al.43, while all subsequent analyses and visualizations of these metrics were performed using custom R57 scripts, which are made publicly available. The second component of the OSF repository project, titled Validation Software, contains:

• All numerical tables and R code57, executed in RStudio58, for reproducing the statistical analyses and generating the figures in the main manuscript and Supplementary Material (MSC statistical validation and complete analysis of QIP metrics.zip). The code required to generate all corresponding figures (MSC_analysis_results_and_figures.zip).

• Automatically generated textual descriptions (captions) for images in the MSC Database produced using the GPT-4o59 vision-language model (MSC dataset captions.zip).

References

  1. Chatterjee, A. & Vartanian, O. Neuroaesthetics. Trends Cogn Sci 18, 370–5 (2014).

    Google Scholar 

  2. Vessel, E. A., Ishizu, T. & Bignardi, G. in The Routledge International Handbook of Neuroaesthetics (eds. Skov, M. & Nadal, M.) 102-131 (Routledge, London, 2022).

  3. Carandini, M. et al. Do We Know What the Early Visual System Does? The Journal of Neuroscience 25, 10577–10597 (2005).

    Google Scholar 

  4. Reber, R., Winkielman, P. & Schwarz, N. Effects of Perceptual Fluency on Affective Judgments. Psychological Science 9, 45–48 (1998).

    Google Scholar 

  5. Goodale, M. A. & Milner, A. D. Separate visual pathways for perception and action. Trends in Neurosciences 15, 20–25 (1992).

    Google Scholar 

  6. Farzanfar, D. & Walther, D. B. Changing What You Like: Modifying Contour Properties Shifts Aesthetic Valuations of Scenes. Psychological Science 34, 1101–1120 (2023).

    Google Scholar 

  7. Iigaya, K., Yi, S., Wahle, I. A., Tanwisuth, K. & O’Doherty, J. P. Aesthetic preference for art can be predicted from a mixture of low- and high-level visual features. Nature Human Behaviour 5, 743–755 (2021).

    Google Scholar 

  8. McManus, I. C. The aesthetics of simple figures. British Journal of Psychology 71, 505–524 (1980).

    Google Scholar 

  9. Mallon, B., Redies, C. & Hayn-Leichsenring, G. Beauty in abstract paintings: perceptual contrast and statistical properties. Frontiers in Human Neuroscience 8 (2014).

  10. Spehar, B., Clifford, C. W. G., Newell, B. R. & Taylor, R. P. Universal aesthetic of fractals. Computers & Graphics-UK 27, 813–820 (2003).

    Google Scholar 

  11. Tinio, P. P. L. & Leder, H. Natural scenes are indeed preferred, but image quality might have the last word. Psychology of Aesthetics, Creativity, and the Arts 3, 52–56 (2009).

    Google Scholar 

  12. Bertamini, M., Rampone, G., Makin, A. D. J. & Jessop, A. Symmetry preference in shapes, faces, flowers and landscapes. PeerJ 7, e7078 (2019).

    Google Scholar 

  13. Jacobsen, T. & Höfel, L. Aesthetic Judgments of Novel Graphic Patterns: Analyses of Individual Judgments. Perceptual and Motor Skills 95, 755–766 (2002).

    Google Scholar 

  14. Rhodes, G. The Evolutionary Psychology of Facial Beauty. Annual Review of Psychology 57, 199–226 (2006).

    Google Scholar 

  15. Bar, M. & Neta, M. Humans Prefer Curved Visual Objects. Psychological Science 17, 645–648 (2006).

    Google Scholar 

  16. Bertamini, M., Palumbo, L., Gheorghes, T. N. & Galatsidas, M. Do observers like curvature or do they dislike angularity? British Journal of Psychology 107, 154–178 (2016).

    Google Scholar 

  17. Clemente, A., Penacchio, O., Vila-Vidal, M., Pepperell, R. & Ruta, N. Explaining the curvature effect: Perceptual and hedonic evaluations of visual contour. Psychology of Aesthetics, Creativity, and the Arts (2023).

  18. Vartanian, O. et al. Impact of contour on aesthetic judgments and approach-avoidance decisions in architecture. Proceedings of the National Academy of Sciences 110, 10446–10453 (2013).

    Google Scholar 

  19. Geller, H. A., Bartho, R., Thömmes, K. & Redies, C. Statistical image properties predict aesthetic ratings in abstract paintings created by neural style transfer. Frontiers in Neuroscience Volume 16 - 2022 (2022).

  20. DPChallenge Community. (Challenging Technologies, LLC.) http://www.dpchallenge.com/ Accessed 2017-04-19 (2016).

  21. Flickr Community. (SmugMug, Inc.) https://www.flickr.com/ Accessed 07/09/2016 (2003).

  22. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A. & Jain, R. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 1349–1380 (2000).

    Google Scholar 

  23. Murray, N., Marchesotti, L. & Perronnin, F. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR2012) 2408-2415 (2012).

  24. Joshi, D. et al. Aesthetics and Emotions in Images A computational perspective. IEEE Signal Processing Magazine 28, 94–115 (2011).

    Google Scholar 

  25. Parraga, C. A. et al. The Minimum Semantic Content (MSC) image dataset, software files, and supplemental material (OSF -Open Science Framework), https://doi.org/10.17605/OSF.IO/ZGSVJ (2025).

  26. Parraga, C. A., Muñoz González, M., Otazu, X. & Penacchio, O. in European Conference on Visual Perception (ECVP2022) 139-139 (Nijmegen, The Netherlands, 2022).

  27. Parraga, C. A., Penacchio, O., Muňoz Gonzalez, M., Raducanu, B. & Otazu, X. Aesthetics Without Semantics. arXiv:2505.05331v2 (2025).

  28. Datta, R., Joshi, D., Li, J. & Wang, J. Z. Studying aesthetics in photographic images using a computational approach. European Conference on Computer Vision (ECCV2006) 3953, 288–301 (2006).

    Google Scholar 

  29. Brachmann, A. & Redies, C. Computational and Experimental Approaches to Visual Aesthetics. Frontiers in Computational Neuroscience 11 (2017).

  30. Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annual Review of Neuroscience 24, 1193–1216 (2001).

    Google Scholar 

  31. Field, D. J. Relations between the statistics of natural scenes and the response properties of cortical cells. Journal of the Optical Society of America A 4, 2379–2394 (1987).

    Google Scholar 

  32. Ruderman, D. L. & Bialek, W. Statistics of natural images: Scaling in the woods. Physical Review Letters 73, 814–817 (1994).

    Google Scholar 

  33. Parraga, C. A., Troscianko, T. & Tolhurst, D. J. The human visual system is optimised for processing the spatial information in natural visual images. Current Biology 10, 35–38 (2000).

    Google Scholar 

  34. Parraga, C. A., Troscianko, T. & Tolhurst, D. J. Spatiochromatic properties of natural images and human vision. Current Biology 12, 483–487 (2002).

    Google Scholar 

  35. PdPhoto Community. http://pdphoto.org/ Accessed 07/09/2016 (2003).

  36. Photos Public Domain Community. Free stock photos, textures, images, pictures & clipart for any use including commercial. http://www.photos-public-domain.com/ Accessed 07/09/2016 (2010).

  37. McManus, I. C. et al. The Psychometrics of Photographic Cropping: The Influence of Colour, Meaning, and Expertise. Perception 40, 332–357 (2011).

    Google Scholar 

  38. Laeng, B., Øvervoll, M. & Ala-Pettersen, E. A. Original art paintings are chosen over their “color-rotated” versions because of changed color contrast. Perception 54, 780–814 (2025).

    Google Scholar 

  39. Nakauchi, S. & Tamura, H. Regularity of colour statistics in explaining colour composition preferences in art paintings. Scientific Reports 12, 14585 (2022).

    Google Scholar 

  40. Nascimento, S. M. C. et al. The colors of paintings and viewers’ preferences. Vision Research 130, 76–84 (2017).

    Google Scholar 

  41. Ryabov, A. (MATLAB Central File Exchange, 2022).

  42. The Math Works Inc. Computer Software (The Math Works, Inc., 2022).

  43. Redies, C. et al. A toolbox for calculating quantitative image properties in aesthetics research. Behavior Research Methods 57, 117 (2025).

    Google Scholar 

  44. Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap (Chapman and Hall/CRC, 1994).

  45. Endres, D. M. & Schindelin, J. E. A new metric for probability distributions. IEEE Transactions on Information Theory 49, 1858–1860 (2003).

    Google Scholar 

  46. Ungerleider, L. G. & Haxby, J. V. What’ and ‘where’ in the human brain. Current Opinion in Neurobiology 4, 157–165 (1994).

    Google Scholar 

  47. Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1, 1–47 (1991).

    Google Scholar 

  48. Lu, X., Zhe, L., Hailin, J., Jianchao, Y. & James, Z. W. in Proceedings of the 22nd ACM international conference on Multimedia (ACM, Orlando, Florida, USA, 2014).

  49. Ma, S., Liu, J. & Chen, C. W. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 722-731 (2017).

  50. Kong, S., Shen, X., Lin, Z., Mech, R. & Fowlkes, C. in ECCV 2016 (eds. Leibe, B., Matas, J., Sebe, N. & Welling, M.) 662-679 (Springer International Publishing, Amsterdam, The Netherlands, 2016).

  51. Schober, P., Boer, C. & Schwarte, L. A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth Analg 126, 1763–1768 (2018).

    Google Scholar 

  52. Janse, R. J. et al. Conducting correlation analysis: important limitations and pitfalls. Clinical Kidney Journal 14, 2332–2337 (2021).

    Google Scholar 

  53. Penacchio, O., Otazu, X., Wilkins, A. J. & Haigh, S. M. A mechanistic account of visual discomfort. Frontiers in Neuroscience 17 (2023).

  54. Penacchio, O., Haigh, S. M., Ross, X., Ferguson, R. & Wilkins, A. J. Visual Discomfort and Variations in Chromaticity in Art and Nature. Frontiers in Neuroscience Volume 15 - 2021 (2021).

  55. Bartho, R., Thoemmes, K. & Redies, C. Predicting beauty, liking, and aesthetic quality: A comparative analysis of image databases for visual aesthetics research. arXiv:2307.00984 (2023).

  56. Zeng, H., Cao, Z., Zhang, L. & Bovik, A. C. A Unified Probabilistic Formulation of Image Aesthetic Assessment. IEEE Transactions on Image Processing 29, 1548–1561 (2020).

    Google Scholar 

  57. R Core Team. (R Foundation for Statistical Computing, Vienna, Austria, 2025).

  58. Posit Team. (Posit Software PBC, Boston, MA, 2025).

  59. Hurst, A. et al. GPT-4o System Card. arXiv:2410.21276 (2024).

  60. Pitie, F., Kokaram, A. C. & Dahyot, R. Automated colour grading using colour distribution transfer. Computer Vision and Image Understanding 107, 123–137 (2007).

    Google Scholar 

Download references

Acknowledgements

C.A.P. and X.O. were supported by the Ministerio de Ciencia e Innovación, Gobierno de España MCIN/AEI/10.13039/501100011033: grants PID2020-118254RB-I00 and TED2021-132513B-I00, by the Agencia de Gestió d’Ajuts Univesitaris i de Recerca (AGAUR) through grant 2021-SGR-01470, and CERCA Programme / Generalitat de Catalunya. B.R. is supported by Grant PID2022-143257NB-I00 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A Way of Making Europa”. O.P. was funded by a Maria Zambrano Fellowship for the attraction of international talent for the requalification of the Spanish university system—NextGeneration EU (ALRC). A.J. was funded by the FI fellowship AGAUR 2022 FI-SDUR 00248 (Secretaria d’Universitats i Recerca, Generalitat de Catalunya, and Fons Social Europeu). This work was supported by the Departament de Recerca i Universitats (SGR 2025).

Author information

Authors and Affiliations

  1. Computer Science Dept., Engineering School, Universitat Autònoma de Barcelona (UAB), Campus UAB, Bellaterra, 08193, Barcelona, Spain

    Olivier Penacchio, Arslan Javed, Bogdan Raducanu, Xavier Otazu & C. Alejandro Parraga

  2. Computer Vision Center, Campus UAB, Bellaterra, 08193, Barcelona, Spain

    Olivier Penacchio, Arslan Javed, Bogdan Raducanu, Xavier Otazu & C. Alejandro Parraga

  3. School of Psychology and Neuroscience, University of St Andrews, St Andrews, Fife, KY16 9JP, United Kingdom

    Olivier Penacchio

Authors
  1. Olivier Penacchio
    View author publications

    Search author on:PubMed Google Scholar

  2. Arslan Javed
    View author publications

    Search author on:PubMed Google Scholar

  3. Bogdan Raducanu
    View author publications

    Search author on:PubMed Google Scholar

  4. Xavier Otazu
    View author publications

    Search author on:PubMed Google Scholar

  5. C. Alejandro Parraga
    View author publications

    Search author on:PubMed Google Scholar

Contributions

C.A.P., O.P. and B.R. conceived the study. C.A.P., O.P. and A.J. developed the codebase and conducted the experiments. O.P., C.A.P. and A.J. performed the statistical analyses. C.A.P. and O.P. wrote the initial draft of the manuscript. Project supervision was carried out by C.A.P. and X.O. The manuscript was revised by O.P., C.A.P, B.R. and X.O. Data curation was performed by C.A.P. and O.P.

Corresponding author

Correspondence to Olivier Penacchio.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Penacchio, O., Javed, A., Raducanu, B. et al. The Minimum Semantic Content (MSC) Dataset: A Large, Balanced Resource for Computational Aesthetics Research. Sci Data (2026). https://doi.org/10.1038/s41597-026-06816-0

Download citation

  • Received: 03 September 2025

  • Accepted: 03 February 2026

  • Published: 17 February 2026

  • DOI: https://doi.org/10.1038/s41597-026-06816-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics