Ensemble genetic and CNN model-based image classification by enhancing hyperparameter tuning

Hussain, Wajahat; Mushtaq, Muhammad Faheem; Shahroz, Mobeen; Akram, Urooj; Ghith, Ehab Seif; Tlija, Mehdi; Kim, Tai-hoon; Ashraf, Imran

doi:10.1038/s41598-024-76178-3

Download PDF

Article
Open access
Published: 06 January 2025

Ensemble genetic and CNN model-based image classification by enhancing hyperparameter tuning

Wajahat Hussain¹,
Muhammad Faheem Mushtaq²,
Mobeen Shahroz²,
Urooj Akram²,
Ehab Seif Ghith³,
Mehdi Tlija⁴,
Tai-hoon Kim⁵ &
…
Imran Ashraf⁶

Scientific Reports volume 15, Article number: 1003 (2025) Cite this article

8641 Accesses
21 Citations
Metrics details

Subjects

Abstract

Model optimization is a problem of great concern and challenge for developing an image classification model. In image classification, selecting the appropriate hyperparameters can substantially boost the model’s ability to learn intricate patterns and features from complex image data. Hyperparameter optimization helps to prevent overfitting by finding the right balance between complexity and generalization of a model. The ensemble genetic algorithm and convolutional neural network (EGACNN) are proposed to enhance image classification by fine-tuning hyperparameters. The convolutional neural network (CNN) model is combined with a genetic algorithm GA) using stacking based on the Modified National Institute of Standards and Technology (MNIST) dataset to enhance efficiency and prediction rate on image classification. The GA optimizes the number of layers, kernel size, learning rates, dropout rates, and batch sizes of the CNN model to improve the accuracy and performance of the model. The objective of this research is to improve the CNN-based image classification system by utilizing the advantages of ensemble learning and GA. The highest accuracy is obtained using the proposed EGACNN model which is 99.91% and the ensemble CNN and spiking neural network (CSNN) model shows an accuracy of 99.68%. The ensemble approaches like EGACNN and CSNN tends to be more effective as compared to CNN, RNN, AlexNet, ResNet, and VGG models. The hyperparameter optimization of deep learning classification models reduces human efforts and produces better prediction results. Performance comparison with existing approaches also shows the superior performance of the proposed model.

The application of improved densenet algorithm in accurate image recognition

Article Open access 15 April 2024

A fragmented neural network ensemble method and its application to image classification

Article Open access 27 January 2024

Deep fake detection and classification using error-level analysis and deep learning

Article Open access 08 May 2023

Introduction

The optimization of hyperparameters is a difficult task with broad ramifications across industries including driverless cars and medical applications. The deep learning model’s performance is highly dependent on a number of hyperparameters including learning rates, batch sizes, network topologies, and regularization parameters. These all have a significant impact on the model’s effectiveness. The collective representation of hyperparameter tuning is a difficult and time-consuming procedure due to the intricate structure, interdependencies, and the large search space of these parameters¹. Finding the ideal configuration frequently requires navigating this complex landscape which takes a lot of time and computer power to complete through iterative trial and error. Furthermore, the optimal hyper-parameter sets might vary greatly between datasets and classification issues. It is essential to address this challenge successfully by maximizing the efficacy and accuracy of image classification models, enabling them to realize their full potential in real-world applications. However, there is no one-size-fits-all solution to this problem².

The main goal of this research is to investigate cutting-edge methods and techniques for image classification by hyperparameter optimization. Improving the model’s performance is important but it is reducing the human efforts and computer labor needed to complete the task³. Handwritten numbers have numerous applications including handwriting recognition, postal zip code extraction, and processing bank checks. However, recognizing these numbers is challenging due to their unique stroke types, sizes, and orientations⁴. Various methods have been attempted including artificial neural network (ANN), support vector machine (SVM), rule-based reasoning, and multi-column deep neural networks. User-independent online handwritten digit recognition faces challenges in categorizing strokes. The objective is to use a deep learning model to identify handwritten digit patterns and train a model that can categorize numbers based on their patterns⁵. Recent studies show that deep hierarchical neural networks improve supervised pattern categorization through unsupervised pre-training. These networks focusing on deep convolutional neural networks (DCNNs) have shown potential in various data sets. However, educating them on central processing units (CPUs) can be time-consuming and expensive⁶. Fast parallel neural net code for graphics cards has solved this issue allowing for faster image classification than CPU-based methods⁷.

The network’s capacity optimizes to identify intricate patterns and features in images through the iterative adjustments to weights and biases of the back-propagation training method⁸. Through this procedure, CNN can accurately complete tasks like segmentation, object identification, and image categorization⁹. CNN is the key component of deep learning that enables complex image mapping and classification. It improved the computer vision systems using their ability to automatically learn hierarchical representations. CNN-based LeNet-5 architecture excels in image classification, and computer vision-related tasks¹⁰.

The problems of hyperparameter tuning in Image classification are the main topic especially as they relate to CNNs. The work suggests a novel method to optimize hyperparameters and raise the accuracy of image classification by merging ensemble genetic algorithms (EGAs) with CNN models. The goal of this research is to address problems with model complexity, genetic algorithm (GA) optimization, processing power needs, and dataset generalization. Preparing the dataset, training individual CNN models, optimizing genetic algorithms, adjusting hyperparameters, and evaluating the models are all part of the proposed technique. The difficulties and suggested method are described emphasizing the expected advances in image recognition and computer vision technologies. With an emphasis on thorough experiments using industry-standard datasets like MNIST. This research seeks to offer insightful information for practical applications in a range of sectors.

This study explores the impact of hyperparameter tuning on ensemble model performance using an evolutionary algorithm for improved precision, generalization, and resilience in image classification applications.
The GA-based hyperparameter optimization for deep learning model using the stacking ensemble technique called ensemble genetic algorithm and CNN (EGACNN) has been proposed for image categorization and to enhance model performance.
The deep learning models such as CNN, recurrent neural network (RNN), AlexNet, residual neural network (ResNet), VGG, convolutional recurrent neural network (CRNN), and ensemble of CNN and spiking neural network (CSNN) have been used and combined with GA to enhance dataset comprehension and decision-making in image classification. Ensemble learning enhances image classification system flexibility by combining CNN architectures, especially when dataset differences cause individual models to struggle.
Experiments results of these models highlight the superiority of the proposed ensemble model for image classification by evaluating through accuracy, precision, recall, and F1 score.

Although optimizing hyperparameters is essential for improving the performance of image classification models current approaches frequently encounter considerable difficulties. Recent research takes more time to execute simple tasks and is computationally expensive. These traditional grid search and random search strategies necessitate a thorough investigation of the hyperparameter space without ensuring optimal outcomes¹¹. Furthermore, these techniques result in inefficiencies particularly with complicated models like deep neural networks because they fail to adaptively focus on the most promising areas of the hyperparameter space¹². Although more successful Bayesian optimization can suffer in high-dimensional hyperparameter spaces and may lose its effectiveness in noisy or costly objective function evaluations. Furthermore, a fundamental component of real-world applications with constrained resources is the capacity to balance the trade-offs between accuracy and computing cost something that many optimization techniques fail to do well. These drawbacks emphasize the need for more sophisticated and flexible optimization methods including those incorporating genetic algorithms to quickly and accurately adjust hyperparameters for better model performance in image classification applications¹³.

The structure of the preceding paper is as follows: Section 2 presents the literature analysis of the current systems and their limitations. Section 3 presents the methodology which describes the methods and techniques adopted to carry out experiments and the structure of the methodology. Section 4 presents the performance of deep learning models in comparative analysis. Section 5 describes the conclusion of the research.

Related work

One of the most important tasks in computer vision is classifying images into predefined classes based on their visual information. CNN has become the industry standard for image classification. CNN can learn complex feature relationships from raw data¹⁴. The deep learning models consist of many layers carrying out operations such as convolving, pooling, etc. Different levels of abstraction are used by these layers to extract and integrate data. CNN are trained on large datasets which enable them to learn the relationships between input features and output classes by analyzing images and the labels that correspond to them¹⁵. CNN uses the learned representations during inference to analyze previously unseen images and predict their classes.¹⁶.

A CNN model for small dataset regularization techniques and model average ensemble enhance generalization and classification accuracy in cloud categorization research¹⁷. Evaluation using the SWIMCAT dataset demonstrates perfect classification accuracy highlighting the model’s tenacity¹⁸. An MCUa dynamic deep CNN model classifies breast histology images using multilevel context-aware models and uncertainty quantification achieving high accuracy by addressing categorization challenges due to visual heterogeneity and lack of contextual information in large digital data¹⁹. Researchers examine the performance effects of hyperparameters and model optimization techniques on four DNN models. The findings indicate that different models and performance factors are affected by hyperparameters²⁰. Moreover, the research advises practitioners to take into account a variety of performance indicators and to be aware of the cumulative nature of optimization and hyperparameter tuning²¹.

CNN image classification using data augmentation and batch normalization enhances precision and effectiveness by normalizing input and creating fresh training samples from existing data²². The EnsNet ensemble learning method combines FCSNs with a basic CNN segmenting feature maps into subsets and training FCSNs to forecast class labels²³. A majority vote from both CNNs determines the model’s output aiming to improve object identification performance²⁴. The CE-ResNet model was developed by combining a ResNet with a capsule neural network (CapsNet) technique. CNNs are utilized as classifiers for fruit recognition and pricing in supermarkets²⁵.

To improve image classification methods, this work combines the capabilities of CNNs and Genetic Algorithms. Inspired by the evolutionary processes found in nature GAs are remarkably adept at exploring intricate solution spaces to find nearly optimal configurations²⁶. These algorithms explore a wide range of options through iterative refinement focusing on solutions that perform better. Meanwhile, CNNs are industry mainstays in image classification because of their intrinsic capacity to extract complex patterns and hierarchical features from unprocessed pixel data. However, the CNN performance is highly dependent on the fine-tuning of hyperparameters like learning rates, network topologies, and regularization strategies. Adjusting these hyperparameters by hand is time-consuming and frequently does not fully capture the range of possible combinations.²⁷.

This work aims to overcome the difficulties associated with hyperparameter optimization, model generalization, and robustness in image classification tasks by integrating GAs into CNN training. A new era of efficiency in this field is anticipated as a result of the mutually beneficial combination of CNNs and GAs which promises to improve the flexibility and robustness of image classification models in addition to streamlining the optimization process²⁸. Researchers consider many different hyperparameters and architectural choices that significantly affect the CNN model’s performance while fine-tuning it with GAs. These parameters include things like the number of layers, learning rates, batch and filter sizes, and how the convolutional and pooling processes are set up. Each set of parameters indicates a potential CNN architecture generating a diverse set of options for the GA²⁹.

The process begins with an assessment of each specific CNN design using a validation dataset. Each architecture’s fitness is evaluated using performance metrics like classification accuracy or loss function values. This initial evaluation serves as the foundation for further optimization phases and provides a benchmark to compare different configurations. Through iterative evolution, GAs improve the population of CNN structures across multiple generations. With every cycle genetic processes including crossover and mutation result in the production of new individuals. Crossover produces offspring with a variety of features by combining traits from two-parent architectures³⁰. Mutation introduces small random modifications to particular structures which promotes exploration of new solution areas. Individuals with greater fitness levels have a higher probability of producing progeny due to mechanisms of selection. Natural selection is a process that results in advantageous traits being passed down to the following generation. Configurations that perform better in terms of classification accuracy and loss minimization are gradually adopted by the population³¹.

GAs have the potential to be optimized but they face a number of obstacles that limit their effectiveness. The great dimensionality of images, each pixel representing features a significant obstacle since it creates a vast search field for solutions. Furthermore, the optimization landscape is complicated by the non-linear and non-convex relationship between image characteristics and class labels, which frequently causes GAs to struggle to converge to global optima³². In picture classification jobs, where it can be difficult to discover an ideal trade-off, the GAs become problematic. Moreover, to guarantee both efficacy and computational efficiency, CNN designs or hyperparameters must be represented and encoded in a way that is appropriate for GAs. To overcome these obstacles, novel algorithmic designs and hybrid strategies are required, which combine GAs with other optimization methods or make use of parallel computing frameworks to increase GAs’ efficiency in CNN architecture optimization for image classification applications³³.

The genetic algorithm improves the CNN model’s hyperparameters using a population-based optimization technique. This technique compares new algorithms to different parameters to enhance classification performance on the MNIST dataset³⁴. EAs optimize artificial neural network design and parameters automating hyperparameter tweaking and simulating natural evolution. This research utilizes a two-level genetic algorithm and neuro evolution to find CNN and neural network’s topologies balancing the search time and fitness integrity³⁵. The method speeds up fitness evaluation and allows adaptable CNN structures to outperform previous techniques and reduce training time³⁶. Wound treatment optimization (WTO) a distributed method inspired by biological processes was used to train a LeNet CNN model learning parameters³⁷. This method improved training time and accuracy on the MNIST dataset. This technique can be applied in various fields including robotics, multi-agent systems, etc.³⁸.

The MR-DCAE model detects reconstruction problems and employs a deep convolutional autoencoder to identify radio transmissions that are not allowed. To maintain manifold invariance the model incorporates a similarity estimator and is optimized via entropy-stochastic gradient descent. MR-DCAE demonstrates cutting-edge performance when tested on the AUBI2020 dataset successfully identifying unauthorized signals in intricate settings³⁹. Ms-RaT model which uses multi-scale analysis to improve feature learning from radio signals employs dual channel representation. Extensive simulations and ablation investigations validate that the model provides greater accuracy with equivalent or lower computing complexity than existing deep learning methods⁴⁰.

The lightweight MobileViT neural network which uses clustered constellation pictures from I/Q sequences for real-time modulation classification was recently introduced in Automatic Modulation Classification (AMC) work. On the RadioML 2016.10a dataset, this model which was created for edge computing platforms performs better than previous approaches and has shown to be resilient in a variety of scenarios. When it comes to using deep learning for real-time AMC on devices with limited resources MobileViT is a trailblazing method⁴¹. For real-time AMC in drone communication systems MobileRaTis a lightweight transformer model with pruning based on information entropy has been presented. It achieves higher accuracy and efficiency on public datasets. This method shows flexibility to different communication scenarios by combining pruning for the first time with a transformer model for temporal signal processing⁴². In order to recognize partial discharge patterns in power transformers a hybrid CNN-LSTM model that makes use of dual-channel pictures from PRPD and PRPS has been presented. This method outperforms both conventional and sophisticated deep learning techniques as it is the first to leverage dual-channel spectrum inputs⁴³.

There are clear benefits and drawbacks to using both genetic algorithms and CNNs for image classification especially when using the MNIST dataset. Because CNNs can automatically extract and learn features from the images they perform exceptionally well and accurately on the MNIST dataset making them highly useful for this purpose. They are preferred for image classification because they take advantage of their robust findings and translation invariance which allow them to extract hierarchical features. CNNs can be limited in contexts with limited resources though as they need higher processing power and a lot of data to train well. On the other hand evolutionary algorithms can be used to choose feature subsets or tune hyperparameters which may enhance CNN performance however they are less frequent for direct image classification applications. Although they provide a population-based flexible method of problem solving their iterative nature can make them computationally expensive and less effective for direct classification problems. Due to their direct approach and high accuracy, CNNs typically perform better than genetic algorithms for image classification tasks in practical applications genetic algorithms on the other hand might be more appropriate for problems linked to optimizing CNN configurations. Table 1 provides a critical summary of discussed research works.

Table 1 Advantages and disadvantages of existing research works.

Subjects

Abstract

Similar content being viewed by others

The application of improved densenet algorithm in accurate image recognition

A fragmented neural network ensemble method and its application to image classification

Deep fake detection and classification using error-level analysis and deep learning

Introduction

Related work

Methodology

Dataset

Image normalization

Genetic algorithm

Population and individuals

Crossover

Mutation

Environmental selection

Fitness function

Selection

Best solution

Deep learning models

ResNet model

Convolutional neural network

Recurrent neural network

AlexNet model

VGG model

CSNN model

CNN-LSTM

DCAE

EGACNN

Evaluation parameters

Results and discussion

Results for CNN model

Results of RNN model

Results using AlexNet model

Results using ResNet model

Results using VGG model

Results using RSNN model

Results using CRNN model

Results using CSNN model

Results using CNN-LSTM model

Results using DCAE model

Results using proposed EGACNN model

Analysis

Performance of all deep learning models

Ablation study

Performance with existing approaches

Future work

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflicts of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

From image classification to segmentation: a comprehensive empirical review

Comparative analysis of RNN, LSTM and CNN algorithms for marine data prediction

Search

Quick links