Abstract
Image databases are central to empirical aesthetics, enabling tests of how image statistics relate to observers’ appreciation. However, many existing databases have two key limitations: (1) they conflate low-level visual features with high-level semantic content, making it difficult to separate visual from cognitive influences on aesthetic judgments; and (2) they are imbalanced, overrepresenting highly appreciated images. To address these issues, we present the Minimum Semantic Content (MSC) database, a large, systematically curated resource for computational aesthetics. It comprises 10,426 natural scenes with reduced, homogenized semantic content, minimizing cognitive and emotional confounds. Each received 100 individual aesthetic ratings from naïve observers, drawn from a pool of approximately 10,000 participants, via crowdsourcing. The database includes both “beautified” and “uglified” versions, generated with a manipulation technique that promotes uniform coverage across the aesthetic spectrum. This broader distribution mitigates bias and overfitting in models. Validation also shows improved robustness in computational models overall. This database enables researchers to study how perceptual features shape aesthetic judgments, using stimuli with very limited semantic and contextual confounds.
Similar content being viewed by others
Data availability
The MSC database is openly available via the Open Science Framework (OSF)25, and can be accessed at https://doi.org/10.17605/OSF.IO/ZGSVJ. The OSF project includes two components: (1) “The MSC Database” and (2) “Validation Software”. The first component contains the complete image database as well as a comprehensive.csv file with the raw rating results. This arrangement allows users to easily determine the type and provenance of each image and provides access to detailed metadata and aesthetic ratings, supporting a wide range of analyses and applications. All images included in the MSC database are either in the public domain or released under licenses permitting reuse and redistribution.
Code availability
All QIP metrics were computed using the Python-based QIP toolbox of Redies et al.43, while all subsequent analyses and visualizations of these metrics were performed using custom R57 scripts, which are made publicly available. The second component of the OSF repository project, titled Validation Software, contains:
• All numerical tables and R code57, executed in RStudio58, for reproducing the statistical analyses and generating the figures in the main manuscript and Supplementary Material (MSC statistical validation and complete analysis of QIP metrics.zip). The code required to generate all corresponding figures (MSC_analysis_results_and_figures.zip).
• Automatically generated textual descriptions (captions) for images in the MSC Database produced using the GPT-4o59 vision-language model (MSC dataset captions.zip).
References
Chatterjee, A. & Vartanian, O. Neuroaesthetics. Trends Cogn Sci 18, 370–5 (2014).
Vessel, E. A., Ishizu, T. & Bignardi, G. in The Routledge International Handbook of Neuroaesthetics (eds. Skov, M. & Nadal, M.) 102-131 (Routledge, London, 2022).
Carandini, M. et al. Do We Know What the Early Visual System Does? The Journal of Neuroscience 25, 10577–10597 (2005).
Reber, R., Winkielman, P. & Schwarz, N. Effects of Perceptual Fluency on Affective Judgments. Psychological Science 9, 45–48 (1998).
Goodale, M. A. & Milner, A. D. Separate visual pathways for perception and action. Trends in Neurosciences 15, 20–25 (1992).
Farzanfar, D. & Walther, D. B. Changing What You Like: Modifying Contour Properties Shifts Aesthetic Valuations of Scenes. Psychological Science 34, 1101–1120 (2023).
Iigaya, K., Yi, S., Wahle, I. A., Tanwisuth, K. & O’Doherty, J. P. Aesthetic preference for art can be predicted from a mixture of low- and high-level visual features. Nature Human Behaviour 5, 743–755 (2021).
McManus, I. C. The aesthetics of simple figures. British Journal of Psychology 71, 505–524 (1980).
Mallon, B., Redies, C. & Hayn-Leichsenring, G. Beauty in abstract paintings: perceptual contrast and statistical properties. Frontiers in Human Neuroscience 8 (2014).
Spehar, B., Clifford, C. W. G., Newell, B. R. & Taylor, R. P. Universal aesthetic of fractals. Computers & Graphics-UK 27, 813–820 (2003).
Tinio, P. P. L. & Leder, H. Natural scenes are indeed preferred, but image quality might have the last word. Psychology of Aesthetics, Creativity, and the Arts 3, 52–56 (2009).
Bertamini, M., Rampone, G., Makin, A. D. J. & Jessop, A. Symmetry preference in shapes, faces, flowers and landscapes. PeerJ 7, e7078 (2019).
Jacobsen, T. & Höfel, L. Aesthetic Judgments of Novel Graphic Patterns: Analyses of Individual Judgments. Perceptual and Motor Skills 95, 755–766 (2002).
Rhodes, G. The Evolutionary Psychology of Facial Beauty. Annual Review of Psychology 57, 199–226 (2006).
Bar, M. & Neta, M. Humans Prefer Curved Visual Objects. Psychological Science 17, 645–648 (2006).
Bertamini, M., Palumbo, L., Gheorghes, T. N. & Galatsidas, M. Do observers like curvature or do they dislike angularity? British Journal of Psychology 107, 154–178 (2016).
Clemente, A., Penacchio, O., Vila-Vidal, M., Pepperell, R. & Ruta, N. Explaining the curvature effect: Perceptual and hedonic evaluations of visual contour. Psychology of Aesthetics, Creativity, and the Arts (2023).
Vartanian, O. et al. Impact of contour on aesthetic judgments and approach-avoidance decisions in architecture. Proceedings of the National Academy of Sciences 110, 10446–10453 (2013).
Geller, H. A., Bartho, R., Thömmes, K. & Redies, C. Statistical image properties predict aesthetic ratings in abstract paintings created by neural style transfer. Frontiers in Neuroscience Volume 16 - 2022 (2022).
DPChallenge Community. (Challenging Technologies, LLC.) http://www.dpchallenge.com/ Accessed 2017-04-19 (2016).
Flickr Community. (SmugMug, Inc.) https://www.flickr.com/ Accessed 07/09/2016 (2003).
Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A. & Jain, R. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 1349–1380 (2000).
Murray, N., Marchesotti, L. & Perronnin, F. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR2012) 2408-2415 (2012).
Joshi, D. et al. Aesthetics and Emotions in Images A computational perspective. IEEE Signal Processing Magazine 28, 94–115 (2011).
Parraga, C. A. et al. The Minimum Semantic Content (MSC) image dataset, software files, and supplemental material (OSF -Open Science Framework), https://doi.org/10.17605/OSF.IO/ZGSVJ (2025).
Parraga, C. A., Muñoz González, M., Otazu, X. & Penacchio, O. in European Conference on Visual Perception (ECVP2022) 139-139 (Nijmegen, The Netherlands, 2022).
Parraga, C. A., Penacchio, O., Muňoz Gonzalez, M., Raducanu, B. & Otazu, X. Aesthetics Without Semantics. arXiv:2505.05331v2 (2025).
Datta, R., Joshi, D., Li, J. & Wang, J. Z. Studying aesthetics in photographic images using a computational approach. European Conference on Computer Vision (ECCV2006) 3953, 288–301 (2006).
Brachmann, A. & Redies, C. Computational and Experimental Approaches to Visual Aesthetics. Frontiers in Computational Neuroscience 11 (2017).
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annual Review of Neuroscience 24, 1193–1216 (2001).
Field, D. J. Relations between the statistics of natural scenes and the response properties of cortical cells. Journal of the Optical Society of America A 4, 2379–2394 (1987).
Ruderman, D. L. & Bialek, W. Statistics of natural images: Scaling in the woods. Physical Review Letters 73, 814–817 (1994).
Parraga, C. A., Troscianko, T. & Tolhurst, D. J. The human visual system is optimised for processing the spatial information in natural visual images. Current Biology 10, 35–38 (2000).
Parraga, C. A., Troscianko, T. & Tolhurst, D. J. Spatiochromatic properties of natural images and human vision. Current Biology 12, 483–487 (2002).
PdPhoto Community. http://pdphoto.org/ Accessed 07/09/2016 (2003).
Photos Public Domain Community. Free stock photos, textures, images, pictures & clipart for any use including commercial. http://www.photos-public-domain.com/ Accessed 07/09/2016 (2010).
McManus, I. C. et al. The Psychometrics of Photographic Cropping: The Influence of Colour, Meaning, and Expertise. Perception 40, 332–357 (2011).
Laeng, B., Øvervoll, M. & Ala-Pettersen, E. A. Original art paintings are chosen over their “color-rotated” versions because of changed color contrast. Perception 54, 780–814 (2025).
Nakauchi, S. & Tamura, H. Regularity of colour statistics in explaining colour composition preferences in art paintings. Scientific Reports 12, 14585 (2022).
Nascimento, S. M. C. et al. The colors of paintings and viewers’ preferences. Vision Research 130, 76–84 (2017).
Ryabov, A. (MATLAB Central File Exchange, 2022).
The Math Works Inc. Computer Software (The Math Works, Inc., 2022).
Redies, C. et al. A toolbox for calculating quantitative image properties in aesthetics research. Behavior Research Methods 57, 117 (2025).
Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap (Chapman and Hall/CRC, 1994).
Endres, D. M. & Schindelin, J. E. A new metric for probability distributions. IEEE Transactions on Information Theory 49, 1858–1860 (2003).
Ungerleider, L. G. & Haxby, J. V. What’ and ‘where’ in the human brain. Current Opinion in Neurobiology 4, 157–165 (1994).
Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1, 1–47 (1991).
Lu, X., Zhe, L., Hailin, J., Jianchao, Y. & James, Z. W. in Proceedings of the 22nd ACM international conference on Multimedia (ACM, Orlando, Florida, USA, 2014).
Ma, S., Liu, J. & Chen, C. W. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 722-731 (2017).
Kong, S., Shen, X., Lin, Z., Mech, R. & Fowlkes, C. in ECCV 2016 (eds. Leibe, B., Matas, J., Sebe, N. & Welling, M.) 662-679 (Springer International Publishing, Amsterdam, The Netherlands, 2016).
Schober, P., Boer, C. & Schwarte, L. A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth Analg 126, 1763–1768 (2018).
Janse, R. J. et al. Conducting correlation analysis: important limitations and pitfalls. Clinical Kidney Journal 14, 2332–2337 (2021).
Penacchio, O., Otazu, X., Wilkins, A. J. & Haigh, S. M. A mechanistic account of visual discomfort. Frontiers in Neuroscience 17 (2023).
Penacchio, O., Haigh, S. M., Ross, X., Ferguson, R. & Wilkins, A. J. Visual Discomfort and Variations in Chromaticity in Art and Nature. Frontiers in Neuroscience Volume 15 - 2021 (2021).
Bartho, R., Thoemmes, K. & Redies, C. Predicting beauty, liking, and aesthetic quality: A comparative analysis of image databases for visual aesthetics research. arXiv:2307.00984 (2023).
Zeng, H., Cao, Z., Zhang, L. & Bovik, A. C. A Unified Probabilistic Formulation of Image Aesthetic Assessment. IEEE Transactions on Image Processing 29, 1548–1561 (2020).
R Core Team. (R Foundation for Statistical Computing, Vienna, Austria, 2025).
Posit Team. (Posit Software PBC, Boston, MA, 2025).
Hurst, A. et al. GPT-4o System Card. arXiv:2410.21276 (2024).
Pitie, F., Kokaram, A. C. & Dahyot, R. Automated colour grading using colour distribution transfer. Computer Vision and Image Understanding 107, 123–137 (2007).
Acknowledgements
C.A.P. and X.O. were supported by the Ministerio de Ciencia e Innovación, Gobierno de España MCIN/AEI/10.13039/501100011033: grants PID2020-118254RB-I00 and TED2021-132513B-I00, by the Agencia de Gestió d’Ajuts Univesitaris i de Recerca (AGAUR) through grant 2021-SGR-01470, and CERCA Programme / Generalitat de Catalunya. B.R. is supported by Grant PID2022-143257NB-I00 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A Way of Making Europa”. O.P. was funded by a Maria Zambrano Fellowship for the attraction of international talent for the requalification of the Spanish university system—NextGeneration EU (ALRC). A.J. was funded by the FI fellowship AGAUR 2022 FI-SDUR 00248 (Secretaria d’Universitats i Recerca, Generalitat de Catalunya, and Fons Social Europeu). This work was supported by the Departament de Recerca i Universitats (SGR 2025).
Author information
Authors and Affiliations
Contributions
C.A.P., O.P. and B.R. conceived the study. C.A.P., O.P. and A.J. developed the codebase and conducted the experiments. O.P., C.A.P. and A.J. performed the statistical analyses. C.A.P. and O.P. wrote the initial draft of the manuscript. Project supervision was carried out by C.A.P. and X.O. The manuscript was revised by O.P., C.A.P, B.R. and X.O. Data curation was performed by C.A.P. and O.P.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Penacchio, O., Javed, A., Raducanu, B. et al. The Minimum Semantic Content (MSC) Dataset: A Large, Balanced Resource for Computational Aesthetics Research. Sci Data (2026). https://doi.org/10.1038/s41597-026-06816-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-06816-0


