Abstract
Precise categorization of artistic styles, essential for art historical research, is hindered by the variety of cultural contexts that shape the appearance of artwork and the low precision of available methods. This paper introduces a new two-stage hierarchical model for robust cultural-based artwork classification using deep reinforcement learning for model optimization. This approach is driven by the hypothesis that decoupling cultural context identification from style recognition improves classification precision by enabling culturally-specialized analysis. First, a Convolutional Neural Network (CNN) identifies the cultural origin of an artwork (Western, Islamic, East Asian). Subsequently, style classification CNNs with different styles, customized for every cultural context and hyperparameters optimized using a novel deep reinforcement learning-based algorithm (LABHO), perform fine-grained style identification. The proposed model obtained 96.95% and 0.9581 accuracy and F-measure in cultural context identification, and 88.65% accuracy with 0.8439 F-measure in style classification. These results show a significant improvement in accuracy and efficiency over the conventional approaches, adding a more effective methodology for computational art analysis.
Introduction
Computer science emerged through interdisciplinary application and as such can find its place quite naturally within a liberal arts curriculum1, in a relationship that will benefit both computer science and the liberal arts2. Over the last few decades, we have seen a huge growth of the so-called new media. The fine art, as a component of human civilization, has gone digital. Museums, galleries, art institutes, and even individual collectors of art have scanned and cataloged their artwork over time for preservation and, in certain situations, for careful analysis and public viewing. The requirement for auto-classification as well as evaluation of the digital artworks is likewise increased by this steadily increasing quantity. Innovative solutions to such issues that require human knowledge and thought processes are made possible by artificial intelligence. Art style recognition has, for most of the time, been done by art historians as well as curators. There are efforts made during the last decade to try to automate the task and they yield good result sometimes. The vast majority of the successful solutions solve the task by utilizing a Convolutional Neural Network (CNN) and transfer learning. Transfer learning enables an old knowledge harvested from a parallel problem to be applied to solve a new but more complex problem, typically with lesser data3.
The recent advancement in deep learning has allowed the development of efficient recognition algorithms for several image-centric applications4,5,6. These algorithms are often trained in natural images and are used in actual recognition such as self-driving. Moreover, using identification systems on extensive database of pictures could reveal cultural trends or reveal more about pattern of seeing the world (e.g.,7,8,9). Paintings, as such a human created visual are particularly interesting to analyze from this point of view. Free art can provide information about culturally important concepts changing throughout centuries and information about how human vision works as well due to realistic portrayals of reality shown by masters. Digital art history concerned with object recognition that most computer vision systems used in this domain are interested in, or object recognition (e.g.,10). The description of space has been widely discussed when it comes to scientific work contributors, while depiction of materials has only recently become an object of scientific concerns11.
One of the challenges in this field is the low accuracy of the models, which is related to the high diversity of artworks and the creativity of artists. The liberal arts include a wide range of arts, and due to the individual creativity of the artist, any form or pattern may exist in these works. So that the adherence of these patterns to specific rules is severely limited, and hence, the accuracy of recognition in the registration identification stage faces many challenges. To reduce the complexity of this problem, it can be divided into several sub-problems. In the proposed method, the main idea that motivates this research is to identify the cultural context of the artwork in the first stage. This approach allows us to separate the patterns related to the cultural context of the works, because the different styles of Western and East Asian artworks are significantly different in terms of style. By using dedicated learning models for each of these cultural contexts, the accuracy of recognition can naturally be increased. Therefore, the proposed method is designed based on this idea and it is predicted that this approach will provide the ability to improve the identification of artistic styles.
The innovation in the proposed method includes several aspects. First, a new and multi-stage approach to identifying the style of artworks is presented, in which the cultural context of the work is first identified and then the style recognition is performed using a dedicated learning model. Second, an integration of learning automata and black hole optimization algorithm is proposed, which may perform better than the black hole algorithm, enabling more efficient discovery of inter-global responses. Third, the LABHO optimization algorithm is employed to optimally tune the hyperparameter configuration in CNN in order to increase the accuracy in defining styles of artworks. Such innovations in general, result to high accuracy and efficiency in identifying artistic style through advanced artificial intelligence and optimal methods. This paper contributes to the research field as follows:
-
A novel approach for identifying the style of artworks that first identifies the cultural context before performing style recognition using a dedicated learning model.
-
Introduction of a new optimization method that combines learning automata with a black hole optimization algorithm, outperforming the traditional black hole algorithm and enhancing the speed of discovering inter-global responses.
-
Utilization of the LABHO optimization algorithm for optimal adjustment of hyperparameter configurations in CNNs, facilitating more accurate identification of artistic styles.
The paper continues by looking at the related works in Section “Related work”, and then presenting the introduced approach in Section “Research methodology”. Section “Results and discussion” includes the research finding while Section “Conclusion” presents the conclusion.
Related work
The use of computational approaches, especially deep learning, in the analysis and classification of artistic imagery has experienced considerable expansion in the past few years. This section discusses relevant prior work, discussing important approaches and outlining the context of our proposed novel cultural-based classification framework.
Deep learning approaches for art classification
This section presents works published in this field in recent years. The primitive research had mainly focused on employing standard deep network architectures for the purpose of art style recognition. Menis-Mastromichalakis et al.3 investigated the use of deep networks for the recognition of art style in digitized artworks, comparing eight architectures on two datasets and achieving state-of-the-art results using a stacking ensemble method. Joshi et al.12 introduced a deep self-supervised learning model for recognizing rich artistic styles with a significant accuracy improvement of nearly 20% in comparison to other methods, on WikiArt dataset13 with 27 art classes where the class distribution was seriously imbalanced. These studies show the promise of deep learning, but they point to difficulties with the size of the dataset, class imbalance and complexity of artistic styles.
Apart from style, deep learning has also been used in other aspects of art analysis. Yang and Min14 used a deep CNN model to distinguish between diverse artistic media such as oil paint, pastel, pencil and watercolor and used synthesized oil paint images based on fourteen books. Liu et al.15 developed a new feature-based model to detect artistic movements in portrait paintings by employing the following features: Modified Color Distance, Color Ratio Feature, and Weber’s Law-based Texture Feature. The model described here surpassed previous approaches and yielded a high degree of accuracy. According to these efforts, the flexibility of deep learning approaches in identifying and analyzing various features in artworks can be confirmed.
The effectiveness of CNNs for automatic classification and retrieval of fine art collections has been well known. Zhao et al.16 established the importance and reliability of CNNs for automatic classification and retrieval of fine art collections, preferred higher-resolution image, and relevant training steps. Chen17 presented a Chinese painting classification algorithm using CNN and mutual information theory, which enhanced accuracy and robustness by extracting features from Chinese paintings and fine-tuning the VGG-F model. More recent research, moved the focus of their studies to employing advanced CNNs and deep learning mechanisms. Imran et al.18 aimed to develop a software application for analyzing and categorizing fine art photos in museums and galleries, utilizing DCNNs and shallow neural networks, which demonstrated improved accuracy and precision. Varshney et al.19 generated a database of handcrafted texture descriptors with CNN, which improved classification accuracy by decision fusion and thus presented the first content-based image retrieval tool. Zhang and Ding20 proposed an enhanced ResNet al.gorithm with a version of ResNet 50, enhanced by a blur pool operation, Celu activation function, and triplet attention mechanism. This has resulted in an improved model with the capability for 80.6% classification on large-scale datasets.
Limited domain-specific data, makes transfer learning an attractive solution for image analysis domain. This technique utilizes models that are pre-trained by instances of large datasets. ImageNet, is one of the mostly-used architectures in this domain. Pérez and Cozman21 proposed a method that involved Generative Adversarial Networks (GAN) to improve the accuracy of an art style classifier, thus overcoming the shortfalls of data augmentation when it came to more complex subject matter. Yang22 presented a method for classifying painting art styles using CNNs, employing the VGG-19 visual geometry group 19 mixed transfer learning model. Zhao et al.23 presented a comparison among seven transfer learning models applied to the art classification task on three datasets. The model structural optimization improved performance and made the style and genre classification visible, helping in similarity searches enhancement. Iliadis et al.24 have used the following deep learning architectures: Vision Transformer and MLP Mixer for artwork style recognition with an accuracy of 39% in the WikiArt paintings dataset by comparing common optimizers for future research. Liu et al.25 developed a new recognition system for fine art paintings using convolutional transformers, which outperformed pre-trained models and demonstrated its effectiveness in learning image features, thereby improving art security. Although transfer learning offers a good starting point, the ability to adjust these models to the specifics of artistic features is still a problem.
For comparative analysis of the performance of our proposed method, we chose several relevant benchmark approaches from the literature, such as the hierarchical classification method by Mohammadi and Rustaee26, the VGG-19 transfer learning approach by Yang22, and the transfer learning model comparison study by Zhao et al.23. These works are examples of various strategies in art classification against which we assess our results.
Hierarchical and two-stage classification
Decomposition strategies such as hierarchical or multi-stage classification can be beneficial when the classification task is complex and includes numerous target categories. In computer vision, this means dividing a big complicated classification problem into a set of smaller and easier steps. Although hierarchical approaches are not as widely used to classify art as single stage methods, they have proven to be promising. Mohammadi and Rustaee26 proposed a hierarchical classification for fine-art painting in which complex problems are divided into simpler ones through division of the styles into super-styles known as parents. Based on the experimental results, F1 scores have improved by this approach. This hierarchical partitioning of the problem space is conceptually similar to our two-stage approach, which first sorts by cultural context, then by style. The idea is that by first determining larger, more discernible categories (such as cultural origin) this can make the later, more detailed classification task (such as particular style within that culture) easier, which may result in both better overall accuracy and efficiency than a single-stage classification straight into many style categories. While the literature specifically on two-stage classification is relatively sparse compared to other domains, the effectiveness of such decomposition has been demonstrated in various complex image classification problems.
Content-based artwork analysis
In addition to the style and medium analysis, the other significant aspect of computational art analysis is the identification of semantic content. This strand of research, which concentrates on what is represented, is complementary to our own work on stylistic classification, and assists in putting the difficulties of the field into perspective. This field of research deals with the problem of finding and defining the subject of art. As an example, much attention was paid to identifying certain iconographies, the symbolic figures and themes that are the main focus of art historical research. Banar et al.27 built a multi-modal system that uses both visual characteristics and textual titles to retrieve standardized Iconclass codes, which illustrates the difficulty of the task even when using a multi-modal data source. In complement, Gonthier et al.28 introduced weakly-supervised object detection, which allows to localize iconographic objects (such as religious figures in paintings) without the need of cumbersome manual annotations. Other studies go beyond the analysis of individual objects to study the context as a whole. Huang et al.29 addressed the problem of scene classification in artworks and suggested a multi-step transfer learning method to fill the domain gap between photographic scenes and artistic depictions. At an even more fine-grained level of content, Zinnen et al.30 presented a method and dataset to recognize sensory gestures in historical paintings, which opens a new direction of quantitative analysis in such domains as sensory history.
Such content-oriented studies are priceless since they offer the means to decipher the narrative and symbolic levels of a piece of art. They supplement our style-based model, which deals with the other but equally basic problem of determining the formal visual language that characterizes cultural and artistic schools. It is important to consider these various dimensions of analysis in the development of a complete computational understanding of art.
Optimization techniques in deep learning
Effectively training and configuring deep learning models, especially CNNs, often requires complex optimization techniques to navigate complex hyperparameter spaces and achieve optimal performance. Traditional methods like grid search or random search31 can become computationally prohibitive for models with many parameters or when multiple models need tuning. Bayesian optimization32 is another optimization technique widely used for tuning the hyperparameters in deep learning model; but, it still faces challenges in tuning multiple models with limited budget. This has led to increased interest in metaheuristic optimization algorithms.
Learning automata and black hole optimization
In order to give a background for our novel optimization strategy (LABHO), here, we provide an introduction about its components:
Learning Automata (LA)33 are adaptive decision making units that function in a random environment. They are defined by a set of possible actions, a probability distribution of these actions, and by a mechanism of updating these probabilities according to the feedback (reward or penalty) received from the environment after an action was performed. LAs learn how to choose the best actions in an iterative way making the probability of actions that lead to favorable outcomes higher and the probability of those that lead to unfavorable outcomes lower. This makes them appropriate for stochastic optimization problem where the optimal strategy is not known a priori.
Black Hole Optimization (BHO)34 is a population-based metaheuristic based on the phenomenon of black holes capturing nearby stars in space. With reference to optimization, candidate solutions are referred to as ‘stars’ and the best solution found in the population is the ‘black hole’. In the process of optimization other solutions (‘stars’) are attracted to the best solution (‘black hole’). If a star gets too close to the black hole (within an event horizon radius) it is absorbed and replaced by a new, randomly generated solution which prevents premature convergence and encourages exploration of the search space. BHO is created to explore effectively the global optima in complex problem landscapes.
Research methodology
Details of the suggested approach for categorizing culture-based liberal arts using deep reinforcement learning techniques are provided in this part following an explanation of the features of the data utilized in the study.
Dataset
This data used in this research consists of 16,950 color images reflecting liberal arts from three major cultures. The images in this dataset have varying dimensions. For each sample, two labels are defined. The first label indicates the cultural context of the sample, which can be one of three categories: 1—Western (7,731 samples), 2—Islamic (4,387 samples), and 3—East Asian (4,832 samples). The second label indicates the artistic style. For the Western cultural context, style labels, mostly consistent with WikiArt’s categorizations, are 11 different styles including ‘AbstractArt’, ‘Abs.Expres.’, ‘ColorFieldPain’, ‘Cubism’, ‘Expressionism’, ‘Impressionism’, ‘MagicRealism’, ‘Minimalism’, ‘Neoclassicism’, ‘PostImpres’. And ‘Realism’. For the Islamic cultural context, 5 different style labels were found, including Calligraphy (Kufic, Naskh), Geometric Design, Arabesque, Miniature Painting, and Seljuk Architectural Style. In the same vein, for the East Asian art samples, 5 different style labels were used, which include Chinese Ink Wash Painting, Japanese Ukiyo-e, Korean Dancheong, Literati Painting, and Zen Buddhist Art. These style categories are derived from well established art historical divisions in each cultural domain. Although some general stylistic concepts (e.g. “realism” or “abstraction”) may have similar meaning across cultures, the specific style tags used in this study are, in most cases, specific to each culture and are not assumed to be directly interchangeable for model training, hence the need for dedicated classifiers for each cultural context as described in Section “Classification of artistic style based on reinforcement deep learning model”.
All samples related to the Western culture category are extracted from the WikiArt13 collection. On the other hand, due to the limited number of samples in the Islamic and East Asian cultural categories, web scraping was used to gather samples for these two categories. This was done over a period of three months and aimed at reputable online sources such as digital archives of museums, academic art history databases and specialized curated online galleries for non-Western art. Scraping keywords included such words as “Islamic geometric art”, “Ottoman miniature painting”, “Chinese Shan Shui painting”, “Japanese Edo period art”, “Korean traditional art”, and their equivalents in languages where it was possible. In this process, more than 12,000 samples were collected, and after a rigorous manual review and filtering process by domain experts to eliminate duplicates, images of insufficient resolution (below 150 × 150 pixels), irrelevant content, and stylistically ambiguous examples, the 9219 images were extracted for the Islamic and East Asian categories. The labeling of each image was done by an expert. specifically, all annotations for cultural context and artistic style were assigned and checked by a team of three art historians, each of whom was an expert in one of the respective cultural domains (Western, Islamic, or East Asian art) to ensure accuracy and compliance with established art historical terminology. For the web-scraped images, a consensus approach was adopted by two experts in case of ambiguous samples. Concerning data availability, the combined dataset of image URLs/identifiers and their respective labels utilized in this study can be made available by the corresponding author on reasonable request for academic, non-commercial research purposes, provided that there are no ongoing licensing preparations or copyright concerns with the original sources of images.
Proposed method
To complete a highly efficient system for the taxonomy of artworks, proper ways of data pre-processing, pattern extraction, and pattern analysis need to be used. To achieve these requirements, the proposed method in this research integrates deep learning, optimization, and reinforcement learning. This approach can be divided into the specific steps listed below:
-
1.
Preprocessing the samples.
-
2.
Cultural context recognition of the artwork according to CNN.
-
3.
Classification of the artistic style based on deep reinforcement learning model customized to every cultural niche.
The structure of the suggested model is presented by a diagram in Fig. 1. According to the diagram, the input images undergo preprocessing, which includes converting the color system of the images and normalizing their dimensions. A CNN model is then utilized to classify the samples based on their cultural context. Given the limited number of target categories at this level, a shallow CNN model is used. This model classifies the preprocessed images into one of the categories: Western, East Asian, or Islamic, based on their cultural context. After determining the cultural context for each sample, the task of style recognition for the artwork is performed based on the identified culture. As depicted in Fig. 1, a separate CNN is employed for every cultural context in this step. Every CNN in this stage is trained using samples specific to that cultural context to enable more accurate style identification. It should be noted that this level (style recognition) faces challenges compared to the previous level (cultural context identification), such as the high diversity of style patterns, a large number of target categories, and the need to determine a specific configuration for each model. Therefore, optimization and reinforcement learning strategies are employed at this level to address these challenges. A novel approach based on a mixture of learning automata and the BHO algorithm is used to configure each layer in the CNN models according to each cultural setting. This optimization technique, called LABHO, gives each optimization parameter of a model a learning automaton that tries to figure out the best way to configure one of the CNN model’s characteristics. The LABHO is capable of finding the best CNN model configuration in less iterations by merging the strategies identified by various learning automata.
Although admitting the impressive performance of established deep learning architectures such as ResNet and Vision Transformers (ViT) in general image recognition, our proposed method employs custom CNN architectures for our specific cultural-based classification problem. This is mainly based on our two stage approach. The first step is to categorize artworks into a small set of broad cultural contexts, which we were able to determine could be efficiently handled by a specialized, possibly shallower CNN that would be able to capture salient cultural indicators. Afterwards, the style classification stage uses different CNNs for each cultural domain. This enables a more targeted and possibly more precise feature extraction process, which has been trained on the subtleties of artistic styles in a given cultural context, which may vary greatly from one culture to another. The development of these customized architectures, as opposed to the use of one general or pre-trained model, allows us to take advantage of the structure of our staged classification system, which may contribute to better efficiency and better ability to capture culture-specific artistic features.
Image preprocessing
Image preprocessing is the first step in the proposed method to prepare them for the process of recognizing the artistic style of works. The preprocessing of the input images that is proposed in the method includes only two basic steps, namely color space conversion and dimensional normalization. Three color layers—R, G, and B—represent each image in the dataset. Because intensity in pictures does not always indicate the style or cultural origin of the artwork, the intensity attribute in each layer of the RGB photos may cause confusion during the recognition step. Therefore, each RGB image is converted to HSI color format in order to lessen the impact of the input photos’ brightness and intensity. Separating the intensity features from the color features is made possible by this approach. After this conversion, the H and S layers are used to determine the samples, and the intensity layer (I) from each of the input instances is removed. In performing this step, the array of the particular input image is created with W and L as the dimensions of the matrix. As the learning models expected samples to come in with a fixed dimension, all images are thereby resized to a dimension of 150 × 150 pixels after the preprocessing step. The output of this step is a preprocessed dataset which forms the second step of the presented method and to decode the cultural origin of art pieces with the help of CNN model.
Cultural context recognition of artwork based on CNN
In the phase 2 of the presented model, a CNN is utilized to identify the cultural context of each input artwork. The architecture of the CNN employed for recognizing the cultural context of the artworks is depicted in Fig. 2.
Four 2D convolutional layers with a stride of one in every dimension make up the basic framework of the suggested CNN for identifying the cultural context of artworks, as shown in Fig. 2. A PReLU layer is utilized as the activation operator following each convolutional layer. In contrast to the ReLU layer, PReLU layers permit negative values to flow through by using a trainable variable, such as α. The following equation describes how this layer functions35:
X is the input and Y represents the output variable, in the equation above. The inclusion of PReLU layers improves the generalization of learning models, which is justified by two advantages. First, the issue of feature elimination caused by this layer does not occur because of the non-zero gradients. Second, this layer greatly aids in accelerating training. Thus, hybrid pooling layers boost the suggested CNN model’s practical applicability. This is done due to being two of the most popular pooling layer types, max pooling and average pooling each have intrinsic drawbacks. For example, when paired with ReLU layers, max pooling operator may result in overfitting or low dimensionality of the feature map. Low mapping feature density might also result from average pooling layers. The hybrid pooling layer minimizes the shortcomings of the two models by using a parameter that may be taught, such as p, to use the two pooling functions of maximum and average in a heterogeneous manner. The following is a formulation of how the hybrid pooling layer works36:
\({S}_{max}\) and \({S}_{avg}\) are the output of max pooling and average pooling, respectively, in different steps in the above equation. After the feature map extraction with the fourth hybrid pooling layer, two fully connected layers were utilized for the feature vectorization to classify the sample in the form of a probability distribution. In this network architecture, the dimension of the second fully connected layer is the same as the number of the target categories. The value of neuron i in this layer represents the probability of the sample input belonging to the i-th cultural context. It is worth noting that in this research, a grid search strategy is employed to determine the best settings for tuning the CNN based on various hyperparameter values in the convolutional layers, as well as its training configuration. The different values assignable to each of the hyperparameters examined are presented in Table 1. This approach, which defines the precise hyperparameters and search values in Table 1, was found to be appropriate for the first stage because the number of target categories was relatively limited (three cultural contexts) and the complexity of the model was manageable at this level. This is in contrast to the more advanced LABHO optimization approach used in the second stage, which had to be employed due to the much larger number of potential artistic styles and the requirement to tune several different CNN models, each tailored for a cultural context.
The search for configuration and evaluation of its suitability was conducted based on the validation error criterion, and it was determined that using a dimension of 9 × 9 with 8 filters for the first convolutional layer and a dimension of 7 × 7 with 24 filters for the second convolutional layer yielded the best training performance. Additionally, the optimal dimensions for the third and fourth convolutional layers were determined to be 6 × 6 and 3 × 3, respectively, while the suitable number of filters for these two layers was established as 32 and 64. Furthermore, the use of the Adam optimizer and a minimum batch size of 32 as configurable hyperparameters in the model training process showed the best performance.
Classification of artistic style based on reinforcement deep learning model
To identify possible styles, CNN models are employed then specifically to the given cultural context detected for the artwork employing the CNN expounded on in the previous section. To achieve this, if a sample falls into a cultural category such as A, then the image of the artwork is used as input for a model such as \(CN{N}_{A}\). The model for each cultural type is trained using samples with regards to the respective particular type of art forms distinctive of that culture. As such, this CNN model takes an image of the artwork, analyzes the said image and predicts the artistic style linked to that artwork. As it was stated, using dedicated CNN models for different cultural contexts is caused by different artist patterns for every category. This mechanism enables a more accurate appreciation of the feature patterns as they relate to style in different cultural enclaves. Figure 3 shows the architecture of the CNN models for identifying style in each of the cultures of interest.
The suggested CNN model consists of four 2D convolutional layers, as illustrated in the Fig. 3 inset. Following each convolutional layer are a pooling, a BN, and a ReLU layer. While the remaining layers of a convolution network are meant to learn large patterns from the image, the first two layers are meant to learn subtle patterns from the input. The CNN model we suggested has two fully connected layers after the mentioned four convolution steps. Additionally, the last fully connected layer of the network is utilized to prepare data for the classification level, whereas the first layer (FC1) presents the input feature maps in a vector form. Lastly, classification and SoftMax layers are employed to determine the work’s artistic style. The convolutional and pooling layers’ respective strides are set to one in every dimension in the CNN that was taken into consideration for the trials. The pooling layer’s dimensions are altered as a result of its reliance on the convolutional layer’s dimensionality.
It is crucial to accurately adjust a CNN model’s hyperparameters. Furthermore, a deep CNN has numerous variables and more complexity than previous models of artificial neural networks, which need to be considered when designing the model. It is difficult and time-consuming to optimize the hyperparameters for each of the multiple CNN models that are defined for each cultural context in the presented model. To effectively and efficiently tune these critical parameters, this study uses a novel approach that combines learning automata and the BHO algorithm. This section’s remaining content explains how to use the suggested LABHO algorithm to adjust the CNN model’s hyperparameters. Three sets of adjustable hyperparameters are considered in the suggested CNNs. The first set consists of the variables for the convolutional layers C1 through C4. The convolutional filters’ width, height, and their filter number. Since empirical data show that utilizing convolutional filters of the same height and width can produce favorable results, the hyperparameters for filter height and width are taken to be equal. This can reduce the search space. The filter height and breadth hyperparameters can be given integers in the range [+ 3, +9] based on the size of the input samples. Conversely, the number of filters hyperparameter can be set to an integer between [+8, +128] having stepsize of 8. The pooling kind for P1 through P4 is determined by the second set of hyperparameters, where each pooling layer has the option of selecting the average or maximum function. The third set of programmable hyperparameters is the dimension of FC1, which is an integer in the range [+30, +90].
The LABHO algorithm is specifically designed to address the complex hyperparameter optimization challenge for the style classification CNNs by synergistically combining the local learning capabilities of learning automata with the global search power of the black hole optimization algorithm. The suggested technique for adjusting the previously specified set of hyperparameters combines the BHO algorithm with learning automata models. In order to store appropriate settings for every CNN model layer, the BHO algorithm uses learning automata models. Put another way, a learning automaton model is tasked with tracking the configuration of each CNN hyperparameter. By analyzing the effects of changes to each hyperparameter, the model determines the circumstances that result in a decrease in training error. The LABHO algorithm aims to generate an appropriate CNN structure by integrating the ideal actions of its learning automatas. The organization of solutions in LABHO is then explained in this section, which also describes how fitness is assessed and how to use this algorithm to get the ideal configuration.
In LABHO, there would be thirteen tunable parameters, or optimization variables. As a result, each solution’s length in the LABHO-defined CNN configuration problem is a 13-size numerical vector. Eight parameters are used to adjust the hyperparameters of all four Convolutional layers in each solution vector: four variables describing the dimensions of the filters and four variables that describe the number of filters in each Convolutional layer. Additionally, there are four parameters that indicate the type of pooling function, with 0 denoting maximum and 1 denoting average. The size of the fully connected layer FC1 is the final value in each particle. The CNN model is initially adjusted with the configuration that each particle has decided in order to assess each particle’s fitness. After that, a quarter of the training samples are used to train the configured CNN model, and the solution’s fitness is evaluated using the validation error criterion.
where, N is the total amount of validation instances, and V is the amount of instances for which the true label is different from the predicted label. The suggested technique’s LABHO algorithm seeks to configure CNN in a way that allows for a minimal fitness value.
The algorithm’s search history can be used to create new solutions, allowing search algorithms to converge more quickly. In order to produce random answers that are near the global optimum, LABHO uses learning automata. One of the advances offered in the suggested approach is the faster discovery of the optimal global solution through the combination of learning automata and the BHO algorithm. This method makes use of learning automaton units to accelerate the optimization algorithm’s rate of convergence. Specifically, the Learning Automata are embedded within the BHO framework to guide the generation of new candidate solutions based on past successful configurations, thereby focusing the search towards promising regions. To find new optimal configuration methods, the suggested method’s learning automaton model uses reward and punishment schemes.
The set of selected actions that define each LA is represented by the notation \(A=\left\{{\alpha}_{1},{\alpha}_{2},\ldots,{\alpha}_{n}\right\}\). The choice of an action is determined by the set of probabilities associated with each available action in A. Selecting an element from action set A and deploying it within its environment is how the LA starts its activity. The environment assesses the applied action, and the LA chooses its subsequent course of action depending on the environment’s reaction. Every time an action is chosen, the LA raises the likelihood of the chosen action if the environment responds favorably. Conversely, the likelihood of such action will drop if a negative reaction is obtained.
During this mechanism, the LA uses reward (the probability of selected action increased) and penalty operators (the probability of selected action decreased). It learns, which action is better and should be taken with higher probability in the next iterations. In the employed LABHO, 13 LA models are going to be contemplated for embodying the configuration strategies of the CNN, and each of the automaton model is going to be correlated to one of the tunable hyperparameters of CNN.
Therefore, a LA will be defined for each tunable hyperparameter, with M selectable actions for each LA (M being the number of available options for that hyperparameter). The objective of every LA model is to approximate the ideal value for the associated hyperparameter. Every action in the LABHO algorithm’s initial iteration has an equal probability. Instead of randomly assigning a solution, the optimization algorithm will produce half of the vectors based on the probability values obtained by the LA models. In order to keep the optimization method from becoming stuck in a local minimum, 50% of the solutions are determined using the LAs. This also, preserves the possibility of full exploration in the problem space.
The LA’s structure will be updated at the conclusion of each LABHO search cycle using the population’s poorest and best answers. Therefore, the present population’s best and worst solutions are determined first. The reward operator is then used in the LA models to raise the probabilities of all actions that correspond to the best solution in the population. The following equation33 is used to carry out this process for every LA:
The reward coefficient, denoted by a in the equation above, is set to 0.5. Furthermore, the likelihood of choosing action j of the LA in the k-th cycle is represented by \({p}_{j}\:\left(k\right)\). For every LA that corresponds to the CNN hyperparameters, this process will be replicated. The penalty operation will then be used in the LA models to reduce the likelihood of all actions that correspond to the poorest solution. The following equation33 is used to carry out this process for every LA:
The penalty coefficient, denoted by b in the Eq. (5), is set to 0.5. Furthermore, M stands for the LA’s total amount of actions. The following procedures will be used to use LABHO to find the ideal configuration in light of the previously mentioned explanations:
Step 1: The first LABHO population is created at random once the settings for population size and iterations are established.
Step 2: Using Eq. (3), the suitability of every solution is determined.
Step 3: Find the best solution (black hole) \({X}_{BH}\), which has the lowest fitness in the population.
Step 4: Adjust each candidate solution’s location, like \({X}_{i}\), in accordance with31:
In Eq. (6), the best solution’s location is indicated by \({X}_{BH}\), while the i-th candidate solution (star) location is indicated by the vector \({X}_{i}\). Furthermore, the rand is an arbitrary number between 0 and 1.
Step 5: Determine, using Eq. (7), the threshold radius at which a candidate solution is engulfed by the \({X}_{BH}\).
Step 6: Substitute a solution vector for the existing best solution if its fitness is lower than that of the \({X}_{BH}\).
Step 7: Take out the solutions which are closer to the best solution than the distance R and swap them out for a fresh one. In LABHO, half of these new solutions are generated based on the probabilities learned by the Learning Automata models for each hyperparameter, while the other half are assigned randomly to maintain diversity. In this case, the new solution vector will be assigned a value based on the actions of the learning automata with a probability of ½, otherwise, it will be assigned randomly.
Step 8: Move on to the subsequent step if the algorithm’s iterations have achieved T; if not, go to Step 2.
Step 9: Return \({X}_{BH}\).
Following the aforementioned procedures, the CNN model is set using the hyperparameters discovered by best solution, and all of the samples are utilized to train the model. Identification of artistic style is done using the trained model that was produced.
Results and discussion
The experiments were conducted using MATLAB 2021a software. These experiments were conducted using the cross-validation technique and 10 repetitions. In the first phase, we have focused on identifying the cultural context of artworks, and in the second phase, we have focused on identifying the styles of artwork. Next, contextual label prediction is performed in the following four modes:
-
True Positive (TP): model correctly predicting Positive cases as Positive.
-
False Positive (FP): model incorrectly predicting the Negative cases as Positive.
-
False Negative (FN): model incorrectly predicting positive cases as Negative.
-
True Negative (TN): model correctly predicting negative cases as negative.
Accuracy: Accuracy assesses the overall correctness of predictions by comparing the number of correct predictions to the total number of cases.
Precision: It measures accuracy based upon correctly predicted cases.
Recall: It is the TP rate to predict the ofteness of predicting positive.
F-Measure: F-measure is the weighted average of recall and precision of each class.
Identification of the cultural context of artworks
In the first phase, the proposed method is independently compared with five other models, which include basic neural network models such as ImageNet and AlexNet, as well as three papers with references22,23,26.
Figure 4 depicts the accuracy of the proposed method compared with other methods on 10-fold validation. As shown, the proposed method demonstrates superior performance throughout the various folds. Also, the comparative method presented by Zhao et al.23 has a performance level comparatively closer to that of the proposed method. This shows that the proposed approach remains highly accurate compared to other methods under consideration.
Such a steady high accuracy across folds in Fig. 4 can be explained by a number of design choices made in our Stage 1 CNN. The preprocessing step, which transforms images into the HSI color space, probably diminishes the effect of illumination variations in favor of the network’s ability to concentrate on more invariant chrominance features that are characteristic of cultural contexts. In addition, as explained in our ablation study (Section “Ablation study for cultural context recognition CNN”), the use of PReLU activation functions and hybrid pooling layers in our custom CNN architecture are critical for the ability to learn discriminative features for this 3-class problem. PReLU enables adaptive learning of negative slopes to improve generalization and hybrid pooling offers a balanced feature extraction mechanism.
Figure 5 shows the average accuracy. The increase in accuracy is clearly visible, which indicates that the proposed method is superior to the comparison methods in two aspects. First, the median and average accuracy of the proposed method are higher than that of other methods. Also, the degree of change in the accuracy limit in the proposed method is lower than that of other methods in the cross-validation technique, which indicates a higher reliability of the model. Not only does this emphasize the superior central tendency of our method’s performance, but also a tighter distribution of accuracy values (lower inter-quartile range) implies higher reliability and consistency than the other methods. Such reliability is of the utmost importance for a foundational classification stage.
Confusion matrix is illustrated at the Fig. 6. Confusion matrix is a two-dimensional matrix with True class in the columns and the predicted class made by the model in the rows. The confusion matrix for our proposed method shows very few off diagonal elements, meaning that there are few misclassifications between the three cultural contexts. For example, the model clearly separates Western, East Asian and Islamic art with high fidelity. This high discriminative power is essential because errors in this stage would be propagated and corrupt the subsequent style identification stage. The clarity obtained here directly underpins the effectiveness of our two-stage approach in that artworks are directed to the relevant culture-specific style classifier. Performance shows that as to the compared methods, the novel model helps to differentiate between different categories within samples more effectively. The proposed method gives highest accuracy for diagnosis, moreover misclassification will be less in number when compared with the other methods.
Figure 7 shows the precision, recall, and F-measure metrics. As can be seen in this figure, the proposed method has significantly outperformed the comparative methods such as AlexNet and the method presented by Zhao et al.23. In particular, our approach results in an F-Measure of 0.9681, which is a great compromise between Precision (0.9667) and Recall (0.9697). High precision in this context means that if our model classifies an artwork to a cultural category (for example, ‘Islamic’), then it is very likely to be right, which is important to prevent wrong downstream processing. High recall means that the model correctly identifies the overwhelming majority of artworks that truly belong to each cultural context, providing for full coverage. This strong performance especially the 3.74% increase in F-Measure compared to Zhao et al.23 highlights the effectiveness of our customized Stage 1 CNN architecture and preprocessing measures.
Figure 8 shows the Receiver Operating Characteristic (ROC) curve. The ROC curves in Fig. 8 further support the superior discriminative capability of our method for cultural context identification. The superior AUC consistently for our model across the cultural classes implies a better trade-off between the True Positive Rate (sensitivity) and False Positive Rate (1-specificity) across all decision thresholds. This implies that our model is more robust in the ability to differentiate cultural contexts, regardless of the selected classification threshold, an ideal characteristic for a reliable system.
Table 2. Comparison of the proposed method and other comparative methods in identifying the cultural context of artworks.
The summarized metrics in Table 2 support the cumulative evidence from Figs. 4, 5, 6, 7 and 8. The superior performance of the proposed method in all metrics is not marginal, but is a significant improvement over both the basic models, such as AlexNet and ImageNet, and newer works such as Yang22 and Mohammadi et al.26. This advantage is based on our problem-specific design: a specialized, moderately shallow CNN, tuned for the extraction of general cultural features, which can be made more efficient with the use of HSI color space and useful activation and pooling strategies, as opposed to using more complex general-purpose deep architectures that may be too complex for this particular 3-class problem.
Identifying the styles of artworks
In this section, the style of artworks is examined, and in this regard, the proposed method is evaluated in three different cases. In the first case, the performance of the proposed method is examined with a two-level approach that first identifies the cultural context and then uses CNN models specific to each cultural context to optimize the model through LABHO. Finally, prediction is performed based on each model. The second case Proposed (no LABHO) is related to the use of static CNN models, in which the optimization is not performed based on the LABHO algorithm and only static models are used according to the configuration specified in the first level, i.e., to identify cultural contexts. In the third case Proposed (BHO only), the optimization of the specific model related to the cultural context is performed, but instead of LABHO, simple BHO is used. Comparing the results of this case with the proposed method allows us to examine the effectiveness of the proposed optimization model in improving network performance.
Figure 9 shows the accuracy of the proposed method compared to other methods in 10-fold validation for style identification. The superior accuracy of our full proposed method (Fig. 9) in all folds supports the advantages of our hierarchical approach. By isolating the cultural context in the start, the subsequent style-specific CNNs can work in a more constrained and coherent feature space. This focused learning is further improved by LABHO algorithm that fine tunes each culture specific CNN to identify minute intra-cultural style differences.
Figure 10 depicts the average accuracy in identifying different styles. The increase in accuracy is clearly visible, and this refers to two main aspects of the superiority of the proposed method over the comparison methods. Figure 10 not only indicates a higher median and average accuracy for our method but also its stability, similar to the cultural context stage. The capacity to achieve high accuracy with low variance when working with a greater number of style classes reflects the strength of using dedicated models per cultural context, each tuned for the particular set of styles.
The confusion matrix is shown in Fig. 11. The style identification confusion matrix (Fig. 11), although more complex by nature due to the plurality of style labels, shows that our proposed method outperforms alternatives in the separation of styles. For instance, in Western art, the model is able to differentiate between Abstract art and Abstract Expressionism with fewer mistakes. This enhanced discriminative ability at the style level is a direct result of the two-stage approach and the enhanced optimization offered by LABHO, which enables the model to learn more subtle features of each style within its cultural group.
Figure 12 illustrates the Precision, Recall, and F-measure metrics in style recognition. Precision in this graph represents the ratio of correctly identified samples to the total identified samples, allowing us to understand how effective the model is in correctly classifying different styles. Concurrently, Recall represents the ratio of correctly identified samples to the total real samples, and measures the extent to which the model recognizes different styles. Finally, F-measure acts as a weighted average of Precision and Recall, indicating the balance between these two metrics, especially in situations where one class may be more than the other. These three metrics comprehensively evaluate the model’s performance in accurately identifying cultural styles and reveal its strengths and weaknesses.
Figure 13 shows the Precision, Recall and F-measure metrics for style recognition. As can be seen, in the Precision metric, the proposed method has achieved an accuracy of 0.07 and 0.04 higher than the Proposed (BHO only) and Zhao et al.23 methods, respectively. It has also achieved an accuracy of 0.06 and 0.04 higher than these methods in the Recall metric.
The results in Fig. 12 and described in Fig. 13 reflect the effect of our optimization strategy. The F-Measure, which is the balance between precision and recall, is the highest for our full method. High precision in this case is essential, for example, in cases where precise labeling of the style of an artwork is critical and false positives are expensive. High recall means that most artworks of a given style are actually captured by the system. The great performance boost of the full ‘Proposed’ method over ‘Proposed (no LABHO)’ clearly shows that a generic CNN configuration is not enough for the fine-grained task of style recognition; dedicated optimization is key. In addition, the improvement over ‘Proposed (BHO only)’ confirms the contribution of the Learning Automata component in LABHO, which offers a more sophisticated and adaptive search for optimal CNN hyperparameters specific to each cultural context’s style characteristics.
Figure 14 presents the ROC curve. The ROC curves for style identification indicate a clear advantage for the proposed method in terms of a greater AUC. This suggests better performance in recognizing the various artistic styles at varied operational thresholds. This is especially important considering the subtlety and overlap that can exist between some artistic styles and, therefore, robust differentiation is difficult.
Table 3 shows the performance of different methods in style identification with Accuracy, Precision, Recall and F-Measure metrics. The proposed method has the highest performance among other methods with Accuracy 88.65%.
The combined results in Table 3 quantitatively confirm the benefits of our full proposed system for style identification. A significant improvement over all alternatives is achieved by obtaining an F-Measure of 0.8439.
-
Effect of Two-Stage Approach & Specialization: The large difference between our approach and attempts to classify styles directly (as can be inferred from comparison to single-stage methods such as22,23,26 when applied to style) or without special models demonstrates the advantage of problem decomposition.
-
Impact of LABHO: The performance leap from ‘Proposed (no LABHO)’ (F-Measure 0.7091) to the complete ‘Proposed’ method (F-Measure 0.8439) highlights the importance of the LABHO algorithm. By optimizing different CNNs for different cultural contexts, these models can become experts in identifying styles relevant to the culture, and LABHO is successful in identifying high performing configurations for these specialized learners.
-
Advantage of LA in LABHO: The superiority over ‘Proposed (BHO only)’ (F-Measure 0.7731) is particularly indicative of the improved search abilities brought about by the combination of Learning Automata with BHO, which results in better tuned hyperparameter sets.
Compared with Zhao et al.23 (F-Measure 0.7944), our method’s superiority can be explained by this combination of problem decomposition using the two-stage approach and the more sophisticated, adaptive hyperparameter optimization offered by LABHO for the style-specific models.
Ablation study
In order to assess the contribution of individual components in our proposed framework, we performed a set of ablation studies. The first experiments are concerned with the architectural choices that have been made for the CNN to be used in Stage 1 for cultural context recognition. The second set assesses the efficacy LABHO algorithm for hyperparameter tuning of the culture-specific CNNs in Stage 2 for style classification. All ablation experiments were performed on the same dataset and evaluation metrics (Accuracy, Precision, Recall, and F-Measure) as in Sect. 4.
Ablation study for cultural context recognition CNN
The CNN architecture that is suggested for cultural context recognition (as described in Section “Cultural context recognition of artwork based on CNN”) includes PReLU activation layers and hybrid pooling layers. To evaluate the effect of these components, we compared our proposed Stage 1 CNN with three variants:
-
Proposed CNN (PReLU + Hybrid Pooling): The architecture as it is in Section “Cultural context recognition of artwork based on CNN”, using PReLU activation and hybrid pooling.
-
ReLU Variant: The PReLU activation layers were substituted with standard Rectified Linear Unit (ReLU) layers, but hybrid pooling remained.
-
MaxPool Variant: The hybrid pooling layers were replaced with max pooling layers, but PReLU activation was retained.
-
AvgPool Variant: The hybrid pooling layers were substituted by average pooling layers, but PReLU activation remained.
The classification of artworks into Western, Islamic, and East Asian cultural contexts of these configurations is given in Table 4.
As it can be seen in Table 4, the proposed CNN configuration with PReLU activation and hybrid pooling layers obtained the best performance across all metrics of evaluation for cultural context recognition. The substitution of PReLU with ReLU led to a reduction in accuracy from 96.95 to 95.35% and an analogous reduction in F-Measure from 0.6981 to 0.9515. This implies that the learnable parameter in PReLU, which permits negative values to be passed, adds value to the model’s generalization and learning of unique features for cultural contexts.
Furthermore, substituting hybrid pooling with either max pooling or average pooling also led to a reduction in performance, although less pronounced than the ReLU substitution. The MaxPool variant achieved an accuracy of 96.86%, while the AvgPool variant reached 95.66%. This shows that the hybrid pooling layer that learns to combine the max and average pooling operations is able to capture more relevant spatial information than the use of either pooling strategy alone, thereby justifying its presence in our proposed architecture for cultural context recognition.
Ablation study for style classification hyperparameter optimization
For the second stage, which involves classifying artistic styles within each identified cultural context, we proposed the LABHO algorithm for hyperparameter optimization of the dedicated CNNs (detailed in Section “Classification of artistic style based on reinforcement deep learning model”). In this experiment, we compared our proposed LABHO tuning method with the following scenarios:
-
LABHO: Our proposed Learning Automata-based Black Hole Optimization algorithm. In this experiment, the maximum iterations in LABHO were 200.
-
Grid Search: in this case, grid search mechanism is used for searching in the parameter space of the CNNs. Due to computational constraints, the search was limited to a budget of 200 evaluations.
-
Random Search (Random): Random sampling of hyperparameters from the defined search space was performed for 200 iterations.
-
Bayesian Optimization (Bayes Opt): A Bayesian optimization approach was used to determine optimal hyperparameters, but with a budget of 200 evaluations.
These optimization methods were applied to tune the style classification CNNs based on cultural context. The average performance for style identification is presented in Table 5.
The results in Table 5 demonstrate the superiority of the proposed LABHO algorithm for hyperparameter optimization in the context of style classification. LABHO achieved an average accuracy of 88.65% and an F-Measure of 0.8439, outperforming all other tested scenario cases. Bayesian Optimization was the next best performing method, with an accuracy of 80.50% and an F-Measure of 0.7523. Even with a similar budget of 200 evaluations, both Grid Search and Random Search achieved poorer performance.
The enhanced performance of LABHO is due to its hybrid approach that integrates the exploration ability of the Black Hole Optimization algorithm and the adaptive learning mechanism of Learning Automata. This enables LABHO to better explore the complex hyperparameter search space and move closer towards more optimal configurations for the culture-specific CNNs. The capacity of LAs to learn and reinforce good hyperparameter choices, embedded in the BHO framework, seems especially beneficial for the difficult task of style classification, which includes a greater variety of patterns and more target categories than cultural context recognition.
This ablation study supports the main architectural and methodological decisions in our proposed system. The application of PReLU and hybrid pooling in Stage 1 CNN improves cultural context recognition, and the LABHO algorithm optimizes the culture-specific CNNs in Stage 2 for more accurate style classification.
Conclusion
In this paper, we have focused on the identification and classification of artistic styles with a focus on the impact of cultural contexts. The results show that our proposed method, by utilizing a two-stage classification system, has been able to reduce the complexities in identifying artworks and significantly increase the accuracy of this identification. In particular, achieving a value of 96.95% in the accuracy and 0.9581 in F-measure criteria in the field of identifying the cultural context of artworks indicates the high efficiency of this method. Also, in the field of identifying artistic styles, values of 88.65% and 0.8439 have been obtained for the accuracy and F-measure criteria. These results indicate the superiority of our proposed method compared to other existing methods and confirm its positive impact in improving the accuracy and efficiency of artwork identification systems. Finally, this research attempts to provide more research areas for the future in the field of art identification and classification by presenting an innovative approach. Future work and limitations are as follows:
-
The increase in computational complexity is due to the use of multiple learning models and the use of the LABHO algorithm to optimize the configuration of each model individually. Although this technique increases the accuracy of the results, it also increases the computational burden of the system, which may prevent the effective use of the system in any scenario.
-
In this study, a limited number of cultural contexts were examined. In the present study, only three main cultural contexts were considered. In future research, this number of cultural contexts could be increased and the generality of the model could be evaluated under more realistic conditions.
-
Subjectivity in Artistic Styles: Expert annotation, though helpful, is unable to remove the subjectivity that forms the basis of defining and labelling artistic styles. Consequently, the construction and testing of the model can be influenced, thereby influencing the uniformity and clarity of the identified styles.
-
Transferability and Definition of Styles Across Cultures: There might be problems in the application of artistic style definition in different cultures because stylistic ideas may differ greatly from culture to culture. This impacts the model’s generalizability to cultures not included in the dataset and may introduce biases based on the cultural perspectives embedded in the training data.
Data availability
The Western art portion of the dataset was sourced from the public WikiArt collection [13]. The images for the Islamic and East Asian categories were compiled via web scraping. The final dataset has not been released publicly due to potential copyright infringement of the sources of these images. Although, the image URLs and their corresponding labels can be provided by the corresponding author upon reasonable request for academic, non-commercial research purposes. The source code for the model proposed in this research will be made publicly available in a GitHub repository upon publication of the manuscript.
References
Walker, H. M. & Kelemen, C. Computer science and the Liberal arts: A philosophical examination. ACM Trans. Comput. Educ. (TOCE). 10 (1), 1–10 (2010).
Sherman, M., Hogan, A., O’Sullivan, J. & Schumacher, S. Practical machine learning for Liberal arts undergraduates. J. Comput. Sci. Colleges. 38 (8), 69–79 (2023).
Menis-Mastromichalakis, O., Sofou, N. & Stamou, G. Deep ensemble art style recognition. In 2020 International Joint Conference on Neural Networks (IJCNN) 1–8. IEEE (2020).
Stork, D. G. Pixels & Paintings: Foundations of computer-assisted Connoisseurship (Wiley, 2023).
Hafiz, A. M. & Bhat, G. M. A survey on instance segmentation: State of the art. Int. J. Multimedia Inform. Retr. 9(3), 171–189 (2020).
Ye, M. et al. Deep learning for person re-identification: A survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2021).
Lin, Z., Sun, J., Davis, A. & Snavely, N. Visual chirality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 12295–12303 (2020).
Mall, U., Matzen, K., Hariharan, B., Snavely, N. & Bala, K. Geostyle: Discovering fashion trends and events. In Proceedings of the IEEE/CVF International Conference on Computer Vision 411–420 (2019).
Matzen, K., Bala, K. & Snavely, N. Streetstyle: exploring world-wide clothing styles from millions of photos. arXiv:170601869 (2017).
Crowley, E. & Zisserman, A. The state of the art: Object retrieval in paintings using discriminative regions. In Proceedings of the British Machine Vision Conference 2014. British Machine Vision Association (2014).
Lin, H., Van Zuijlen, M., Wijntjes, M. W., Pont, S. C. & Bala, K. Insights from a large-scale database of material depictions in paintings. In Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part III 531–545. (Springer, 2021).
Joshi, A., Agrawal, A. & Nair, S. Art style classification with self-trained ensemble of autoencoding transformations. arXiv:2012.03377 (2020).
Tan, W. R., Chan, C. S., Aguirre, H. E. & Tanaka, K. Improved ArtGAN for conditional synthesis of natural image and artwork. IEEE Trans. Image Process. 28(1), 394–409 (2018).
Yang, H. & Min, K. Classification of basic artistic media based on a deep convolutional approach. Vis. Comput. 36 (3), 559–578 (2020).
Liu, S., Yang, J., Agaian, S. S. & Yuan, C. Novel features for art movement classification of portrait paintings. Image Vis. Comput. 108, 104121 (2021).
Zhao, W., Jiang, W. & Qiu, X. Big transfer learning for fine art classification. Comput. Intell. Neurosci. 2022(1), 1764606 (2022).
Chen, B. Classification of artistic styles of Chinese art paintings based on the CNN model. Comput. Intell. Neurosci. 2022(1), 4520913 (2022).
Imran, S. et al. Artistic style recognition: Combining deep and shallow neural networks for painting classification. Mathematics 11 (22), 4564 (2023).
Varshney, S., Lakshmi, C. V. & Patvardhan, C. Madhubani art classification using transfer learning with deep feature fusion and decision fusion based techniques. Eng. Appl. Artif. Intell. 119, 105734 (2023).
Zhang, X. & Ding, T. Style classification of media painting images by integrating ResNet and attention mechanism. Heliyon, 10(6). (2024).
Pérez, S. P. & Cozman, F. G. How to generate synthetic paintings to improve art style classification. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II 10 238–253. Springer (2021).
Yang, Z. Classification of picture art style based on VGGNET. In Journal of Physics: Conference Series Vol. 1774, No. 1, 012043. IOP Publishing (2021).
Zhao, W., Zhou, D., Qiu, X. & Jiang, W. Compare the performance of the models in art classification. PLoS ONE, 16(3), e0248414. (2021).
Iliadis, L. A., Nikolaidis, S., Sarigiannidis, P., Wan, S. & Goudos, S. K. Artwork style recognition using vision transformers and MLP mixer. Technologies 10 (1), 2 (2021).
Liu, Y., Bai, H. & Wang, J. Fine-art recognition using convolutional transformers. PeerJ Comput. Sci. 10, e2409 (2024).
Mohammadi, M. R. & Rustaee, F. Hierarchical classification of fine-art paintings using deep neural networks. Iran. J. Comput. Sci. 4, 59–66 (2021).
Banar, N., Daelemans, W. & Kestemont, M. Transfer learning for the visual arts: The multi-modal retrieval of iconclass codes. ACM J. Comput. Cult. Herit. 16(2), 1–16 (2023).
Gonthier, N., Gousseau, Y., Ladjal, S. & Bonfait, O. Weakly supervised object detection in artworks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops 0–0 (2018).
Huang, H., Zinnen, M., Liu, S., Maier, A. & Christlein, V. Scene classification on fine arts with style transfer. In Proceedings of the 6th Workshop on the Analysis, Understanding and Promotion of Heritage Contents 18–27 (2024).
Zinnen, M., Hussian, A., Maier, A. & Christlein, V. Recognizing sensory gestures in historical artworks. Multimedia Tools Appl., 1–29. (2024).
Bergstra, J. & Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13 (1), 281–305 (2012).
Wu, J. et al. Hyperparameter optimization for machine learning models based on bayesian optimization. J. Electron. Sci. Technol. 17 (1), 26–40 (2019).
Torkestani, J. A. & Meybodi, M. R. Learning automata-based algorithms for solving stochastic minimum spanning tree problem. Appl. Soft Comput. 11 (6), 4064–4077 (2011).
Hatamlou, A. Black hole: A new heuristic optimization approach for data clustering. Inf. Sci. 222, 175–184 (2013).
Ding, B., Qian, H. & Zhou, J. Activation functions and their characteristics in deep neural networks. In 2018 Chinese Control and Decision Conference (CCDC) 1836–1841. IEEE (2018).
Momeny, M., Jahanbakhshi, A., Jafarnezhad, K. & Zhang, Y. D. Accurate classification of Cherry fruit using deep CNN based on hybrid pooling approach. Postharvest Biol. Technol. 166, 111204 (2020).
Author information
Authors and Affiliations
Contributions
Tianxue Zhao wrote the main manuscript text. Tianxue Zhao reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
This work was supported by the Scientific Research Project of The Education Department of Jilin Province. Project name: Research on improving the teaching quality of humanities general courses based on AI artificial intelligence. Project number: JJKH20251798SK
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhao, T. A novel model for cultural-based classification of liberal arts using deep reinforcement learning. Sci Rep 15, 38522 (2025). https://doi.org/10.1038/s41598-025-16964-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-16964-9













