Introduction

Osteoarthritis (OA) is one of the most frequent and debilitating chronic illnesses, accounting for the fourth major cause of disability worldwide1, with the knee being the most usually smitten joint. Pain is the defining sign of knee OA, driving patients to seek medical care and contributing to a lower quality of life2. Knee osteoarthritis (KOA) is a prevalent chronic ailment recognized as degenerative knee joint arthritis that results from 'wear and tear’ within the ligaments that connect the femur and tibial bone3,4.

Frequently, the disease is associated with gradual structural degradation of articular cartilage, causing patients to suffer permanent physical impairment. Knee OA has a significant global occurrence rate, as per the latest literature review on the epidemiology of OA5.

Older age, obesity6, and prior injury to the knee7 are all considered risk factors for OA, which results in pain that impairs function and lowers life’s quality. Total knee replacement (TKR), the definitive treatment for OA, is costly and has a short lifespan, particularly for those who are obese8. Consequently, early recognition of OA in the knee is essential for starting therapy, like losing weight and workouts, which effectively stop the evolution of OA in the knee and delay TKR6,9. Furthermore, several studies have emphasized the negative impact of knee osteoarthritis on the economy in terms of GDP loss10, direct healthcare cost burden11, and yearly productivity cost of employment loss12,13.

KOA affects approximately one in every three individuals14,15. More than half of persons aged 65 and up have evidence of osteoarthritis, including that one joint. According to the World Health Organization’s (WHO) 2016 osteoarthritis report, 9.6% of men and 18.0% of women past the age of sixty had typical osteoarthritis. Among them, 80% have mobility issues, and 25% find it challenging to carry out their everyday duties16. According to the United Nations, 130 million people will suffer from KOA by 2050, with 40 million seriously crippled by the condition. KOA is one of the leading five factors that cause disability, posing a growing financial strain on society, mainly because of missed work hours and healthcare costs17. Figure 1 depicts the healthy knee joint and knee joint with osteoarthritis. Clinically, it is critical to diagnose this joint and determine the afflicted areas appropriately. X-ray, MRI, and CT modalities are utilized for scanning these areas to detect wear and tear, as well as other treatments like implanting and total knee replacement.

Fig. 1
figure 1

The normal knee joint and knee joint with osteoarthritis27.

Radiography (X-ray) imaging is preferred for assessing OA18 because of its accessibility, cost-effectiveness, superior spatial resolution and contrast for tissues and bones. There are several forms of OA-related segmentation or categorization techniques to evaluate the knee that are broadly classed as classical approaches and deep learning (DL) approaches19,20,21. In current clinical procedures, OA intensity is typically assessed visually using radiography images, which are prone to inter-rater heterogeneity and time-consuming for big datasets22.

Deep learning (DL), a sophisticated form of artificial intelligence, is successfully used in various medical imaging tasks23. DL can potentially give a new technique for designing OA risk estimation algorithms that predict pain progression by extracting meaningful prognostic information from imaging scans in a timely and automated manner. CNN and other deep learning approaches automatically extract visual aspects from the model architecture through a sequence of transformations to enable the learning of complicated features24,25. CNN belongs to a deep learning technique that falls within the machine learning field of artificial intelligence (AI). CNNs are flexible, relatively simple, and slick for training, as a network learns during the tuning procedure using fewer parameters26. CNN’s overall design consists of a layer for input, hidden layers connected by a sequence of image filters, feed-forward network layers that show image filters on the input image, and an output layer wherein the feature is retrieved20,25. Integrating CNNs and transfer learning frameworks significantly improves the recognition of images for knee osteoarthritis.

This research intended to create and test algorithms for DL risk evaluation for forecasting the development of pain among individuals who have or are susceptible to osteoarthritis in the knee. DL approaches outperform conventional approaches based on clinical, demographic, and radiographic risk factors regarding pain progression prediction. In this paper, the images were processed, improved, and normalized. The suggested CNN and additional pre-trained algorithms were used for the feature extraction task, and a metaheuristic optimizer was employed to choose the best features among them. Lastly, apply the proposed deep neural network (DNN) architecture for categorizing these features.

The rest of the article is structured as follows: section "Related works" addresses recent research efforts in KOA diagnosis; Section "Material and methods" discusses the methodology for the suggested procedure and the feature selector models; Section "Evaluation criteria" illustrates the study’s significant findings; Section "Classification results and discussion" discusses the classification results and discussion; and section "Conclusion" discusses the study’s conclusion and suggestions.

Research contribution

This work addresses the challenge of automatically classifying osteoarthritis in the knee using X-rays. This study presents the following key contributions:

  1. (1)

    A novel system is proposed to assist medical specialists in diagnosing KOA and classifying its severity as needed.

  2. (2)

    The classification models’ accuracy is boosted by implementing pre-processing methods that use a high pass filter to filter images in the frequency domain, highlighting the texture of trabecular bone and increasing classification accuracy.

  3. (3)

    The impact of the dataset’s imbalanced distribution is minimized, and a rebalancing process is also presented, dramatically increasing classification accuracy.

  4. (4)

    A DL model is proposed with the lowest misclassifications in the results.

  5. (5)

    The thoughtful CNN model is applied to extract the features out of the images in the dataset.

  6. (6)

    The significant features are selected by a bGGO optimizer.

  7. (7)

    The selected features are classified by K-nearest neighbor (K-NN), a decision tree (DT), a Multi-layer Perceptron (MLP) and a convolutional neural network (CNN) classifier.

  8. (8)

    A CNN hyper-parameter model is executed with a GGO.

  9. (9)

    A deep neural network (DNN) model is proposed for identifying the KOA features accurately.

  10. (10)

    The KOA recognition performance measures are evaluated against contemporary studies and pre-trained algorithms.

Related works

Some studies have presented methods for classifying Knee Osteoarthritis utilizing various techniques, although the results are far from optimal. New OA classification algorithms are evolving as deep neural network topologies evolve.

In 2016, Antony et al.28 suggested an innovative technique that uses a deep convolutional neural network (DCNN) to categorize the intensity of OA in the knees from radiographs. The outcomes on X-ray images and KL grade dataset demonstrate a notable advancement over the state-of-the-art. In place of template matching, they suggested utilizing horizontal image gradients to train a linear SVM quicker and more precise than template matching. The resulting classification accuracy was 59.6%.

In 2017, Antony et al.29 presented a cutting-edge technique that automatically recognizes knee joints using a fully convolutional neural (FCN) network. By the weighted ratio optimization of two loss functions, namely category cross-entropy and mean-squared loss, they trained convolutional neural networks (CNNs) to evaluate the severity of knee osteoarthritis. They achieved a mean squared error of 0.898 and a multiple classes categorization accuracy of 60.3%.

In 2018, Tiulpin et al.30 suggested an innovative approach to identifying and classifying knee OA using standard radiographs. They used the deep Siamese network structure to classify OA. This architecture’s original purpose was to learn a similarity measure between image pairings. Two branches comprise the entire network, one for each input image. A probability distribution of grades across photos was utilized to assess the graded CAD system. They also tested a well-adjusted ResNet-34 network. The average multiclass accuracy was 66.71%.

In 2018, Suresha et al.31 trained a pre-trained networks (ImageNet) through a training approach alternating among object-categorization and region-proposal network fine-tuning, as shared feature across both was predicted to increase prediction reliability. Knee regions that were manually labeled served as ground truth for the region-proposal network’s training. The accuracy of their multiclass categorization was 88.2%.

In 2019, Abedin et al.32 employed Elastic Net (EN) and Random Forests (RF) to develop predicting approaches utilizing patient evaluation information and CNN trained only on an X-ray dataset. The within-subject association between the two knees was modeled using linear mixed-effects models (LMMs). The CNN, EN, and RF algorithms have root mean squared errors of 0.77, 0.97, and 0.94, respectively.

In 2019, Tiulpin et al.33 introduced an approach based on multimodal machine learning to forecast osteoarthritis progression that uses clinical examination findings, raw radiography data, and the patient’s previous health information. This approach was confirmed using an independent test collection of 3,918 knee pictures among 2129 participants. This approach produced an average precision (AP) of 0.68 (0.66–0.70) and an area under the ROC curve (AUC) of 0.79 (0.78–0.81).

In 2019, Chen et al.34 effectively deployed two deep convolutional neural networks for automated prediction of KOA and its degree of seriousness. The foundational X-ray scans for this approach were received from the OAI. The suggested method begins by recognizing the knee joints in the images utilizing a bespoke YOLOv2 network. They could categorize knee X-ray images into seriousness classifications utilizing the KL grading system after fine-tuning DenseNet, VGG, ResNet, and InceptionV3. Their knee joint identification approach had a recall of 92.2% and a mean Jaccard index of 0.858, while their calibrated VGG-19 model detected knee osteoarthritis severity with 69.7% accuracy.

In 2019, PU Patravali et al.35 developed an approach to calculate cartilage area/thickness utilizing several form descriptors. The generated descriptors achieved an accuracy of 99.81% for the KNN classifier and 95.09% for the DT classifier.

In 2019, PU Patravali et al.36 introduced an innovative method to investigate several segmentation strategies for the early identification of OA. The experiment employed various segmentation techniques, such as Sobel and Prewitt edge segmentation, Otsu’s method of segmentation, and texture-based segmentation. The various statistical features were calculated, analyzed, and categorized. The achieved accuracies were 91.16% for the Sobel approach, 96.80% for Otsu’s approach, 94.92% for the texture approach, and 97.55% for the Prewitt approach.

In 2020, Thomas et al.37 sought to develop an automated system for diagnosing the degree of severity of KOA using radiography. Despite using a large dataset, the approach’s effectiveness was assessed by contrasting its results to the opinions of radiologists specializing in musculoskeletal disorders. The radiograph images were enhanced automatically and then fed into a CNN model. They achieved an F1 score of 70% and overall accuracy of 71% over the whole tested dataset.

In 2020, Leung et al.38 introduced a KOA classification deep-learning algorithm built on sufferers’ knee images with complete knee replacement surgery. They contrasted it with individuals who didn’t have KOA. To discriminate between KL-based grade classes, a ResNet34 model with cross-validation was employed. The study employed a dataset of 4796 photographs obtained from the OAI. The model suggested has an accuracy rate of 72.7%. The restricted dataset size and transfer learning usage hampered the system’s ability to implement more accurately.

In 2021, Javed et al.39 evolved Resnet-14, a residual network that has been pre-trained, to forecast KL grades from radiograph data. A multicenter dataset has been employed to validate the network’s performance. The network obtained 98% accuracy and 98% AUC.

In 2021, Shivanand S. Gornale et al.40 proposed a novel method for detecting osteoarthritis by identifying the region of interest. A database of 1,173 knee X-rays was collected and manually graded by two independent medical specialists using the Kellgren and Lawrence grading system. The computation was accomplished using the histogram of the orientated gradient method and the local binary pattern (LBP). The calculated characteristics were categorized with a decision tree classifier. The proposed approach had an accuracy of 97.86% and 97.61%.

In 2022, Ribas et al.41 suggested an innovative technique for detecting early knee OA based on complicated network modeling and statistical data. The proposed network technique allowed for modeling the primary properties of the X-ray pictures while also increasing the separation between the control and OA groups. The suggested technique’s accuracy was 81.69%.

In 2022, Teo et al.42 introduced pre-trained InceptionV3 and DenseNet201 networks using the OAI dataset for extracting features from the OAI data set, which is divided into five categories based on osteoarthritis intensity. The SVM classifier is employed to categorize the features of the deep learning framework. The accuracy rate for DenseNet201-SVM is 71.33%.

In 2023, C. Guida et al.53 suggested a fusion approach that blends three distinct types: MRI, X-ray, and the patient’s clinical data into a single structure, increasing accuracy over the methods utilized independently. The fusion architecture was constructed utilizing two systems from previous studies trained using a limited dataset. It blended a conventional CNN for X-rays and a unique 3D MRI model. The study’s conclusions indicated that the utilized approach received performance accuracy ratings of 76%, which was inadequate and had to be improved.

In 2024, Anandh Sam Chandra Bose et al.54 utilized a CNN approach to extract characteristics by clinical imaging data. They utilized sophisticated approaches like PSO and Genetic Bee Colony (GBC) to uncover significant characteristics for improving ML models. Comparing approaches with optimized features to those trained with direct CNN features reveals significant accuracy, sensitivity, specificity, PPV, and NPV improvements across various ML techniques, such as SVM, KNN, RF, and Linear Discriminant Analysis (LDA). Features that GBC chose achieved 99.15% accuracy in binary categorization tasks. In multiclass classification, GBC characteristics paired with RF achieved an accuracy of 98.91%.

In 2024, Muhammed Yildirim and Hursit Mutlu43 created a hybrid model by extracting features utilizing Darknet53, Histogram of Directional Gradients (HOG), Local Binary Model (LBP), and Neighborhood Component Analysis (NCA). The dataset included 1650 knee images divided into five categories: standard, doubtful, mild, moderate, and severe—the experimental investigations compared the suggested method’s performance to eight distinct CNN Models. The developed model had an accuracy rating of 83.6%.

Lately, deep learning algorithms are being used in medical imaging to increase the precision of disease diagnosis. CNNs have been utilized in several research to classify knee osteoarthritis as either standard or osteoarthritis reliably.

The researchers succeeded in achieving satisfactory outcomes with a variety of approaches and materials. Every researcher aims to achieve the promised precision of X-ray image analysis for earlier KOA detection. Another thing to consider is that most current studies were conducted using osteoarthritis initiative (OAI) or MOST datasets, with an imbalanced data distribution. This study differs from earlier studies in that it used a variety of approaches and hybrid materials to achieve high accuracy, as well as an applied data-balanced strategy. Because it is challenging to categorize KOA images correctly, the obstacle was overcome by extracting characteristics from many deep neural models, selecting the best one, and then classifying them. Table 1 summarizes relevant studies concerning the diagnosis of knee osteoarthritis.

Table 1 An overview of relevant literature.

Material and methods

The steps involved in the proposed classification approach for knee OA diagnosis in this study are the gathering and preparation of data, the extraction and selection of features, and the recognition of image labels; this is illustrated in Fig. 2. A dataset of knee x-ray images has been downloaded. The gathered dataset was then subjected to the preprocessing procedures. Image enhancement techniques include frequency-domain filtering, histogram equalization, and sharpening. After the dataset has been collected and preprocessed, four common deep-learning approaches, AlexNet44,45, VGG1946, ResNet-5047,48, and GoogleNet49,50, were trained, evaluated, and contrasted with choosing the most effective one for detecting KOA instances. The chosen model received the processed images, and during training, their parameters were adjusted to improve accuracy. Then, features are extracted from the input images by the highest-performing model. The optimal feature collection is then found by processing the retrieved features using the suggested feature selection procedure. An optimized CNN classifier is trained to utilize the optimal set of features to determine the case of the input image. The following subsections will thoroughly explain the suggested framework’s methodology utilizing the KOA dataset.

Fig. 2
figure 2

The general scheme for the suggested framework.

Dataset description

The knee osteoarthritis graded data set provided knee X-ray images utilized in this study to train the proposed framework. The images are accessible on Kaggle27 and collected through the Osteoarthritis Initiative (OAI). There are a total of 3835 knee images, separated by two grades. In the dataset, all images are carefully assessed by competent clinicians as usual or osteoarthritis, with the distribution of each grade displayed in Table 2. The dataset’s images were scaled down to 224 × 224 pixels for easier processing by the model due to their uniform size.

Table 2 Distribution of knee osteoarthritis dataset.

Data preparation

The data Preparation step is essential in analyzing images due to increased standards for high-quality data and consistency.

i. Data Augmentation and Balancing.

Figure 3 depicts the implementation of data enhancement methods on images to boost the dataset’s size while preventing overfitting. The expanded data set enhanced the model’s reliability and accuracy. The flipping approach was used on the dataset. It is possible to build a significantly more extensive and diverse dataset to train the deep learning algorithm using the data augmentation method, making it possible to create additional images with minimum changes to the original ones. When these methods are applied, a model can comprehend the core characteristics of the images since it is exposed to a broader range of permutations. After data augmentation, the dataset contains 5132 images. Table 3 shows the ultimate dataset utilized to train the network for this investigation, and Table 4 summarizes the distribution of all datasets. These knee joints are categorized as train, validation, and test datasets.

Fig. 3
figure 3

The implemented operation of data augmentation technique.

Table 3 Distribution of the Rebalanced Dataset.
Table 4 Distribution of all dataset.

The dataset has a highly uneven distribution; the standard class data is significantly less than the osteoarthritis class in the sets for training and validation, as the osteoarthritis class data is much less than the regular class in the testing set. To avert biasing the training results, the suggested framework attempts to ensure data balance by randomly choosing an equal number of images for every category. This process is known as “data balancing” Ideally, each class should have a detection rate that is nearly or the same. With a balanced dataset, a model can achieve higher detection rates, accuracy, and precision, as demonstrated in the abovementioned examples. To reduce the negative impact on the results., the flipping technique artificially rebalanced the dataset.

ii. Data Pre‐processing.

The frequency domain filter is applied to the images first, and the histogram is normalized to enhance the features of trabecular bone texture and improve recognition accuracy. Second, image sharpening is utilized in a customizable function to reduce noise and equalize histograms. Figure 4 depicts the workflow for the three primary processes: histogram normalization, frequency-domain filtering, and image sharpening.

Fig. 4
figure 4

Image pre‐processing process. (a) The input images. (b) The pre-processed images by histogram equalization. (c) The pre-processed images by sharpening filter. (d) The pre-processed images by a frequency domain high‐pass filter.

The non-linear histogram normalization technique improves the filtered image’s contrast since most X-ray images in the OAI dataset have poor contrast. The image’s intensity is then returned52. Figure 5 depicts the pre-processing findings of the photographs. Equation (1) illustrates the formula for performing histogram equalization on the images, where 'r' represents the input pixel’s value and 's' represents the output pixel’s value. 'L' denoted the image’s highest pixel values. Equation (2) expresses the likelihood of rj intensity level occurrence, where nj is the numeral of pixels with rj intensity and ‘MN’ is the whole numeral of the image’s pixels.

$${\text{sk}} = {\text{T}}\left( {{\text{rk}}} \right) = \left( {{\text{L}} - 1} \right) \mathop \sum \limits_{j = 0 }^{k} p_{r} (r_{j} )$$
(1)
Fig. 5
figure 5

Image preprocessing steps.

$${\text{p}}_{r}{(r}_{j})=\frac{{\text{n}}_{j}}{MN}$$
(2)

Feature extraction

The main pre-trained models used for feature extraction were Alex-Net, Google-Net, VGG-Net, and ResNet50. Those models include layers that incorporate both linear and nonlinear processes that were learned in a combined manner. Extracting features from deep learning frameworks, such as ResNet50, can be divided into multiple steps. The data was fed into the ResNet50 model, and backpropagation was utilized to train the network, adjusting the neurons’ weights and biases to reduce the loss function. Here, features were extracted from images with pinpoint accuracy due to the strength and efficacy of deep learning algorithms like ResNet50.

Feature selection

The X-ray image features are reduced by using the feature selection technique. Increased correlation among characteristics improves the accuracy of classification. This study applies a Greylag Goose (GGO) optimizer to perform the feature selection task.

GGO algorithm

The Greylag Goose Optimization (GGO) algorithm is used for optimization in the present study. There are many advantages to the GGO optimizer, including the colony functions independently of any higher authority (Modularity), the task is completed effectively generally even if multiple agents fail (Robust), and network adjustments can spread quickly (Speed). However, it is difficult to predict behavior based solely on the rules themselves (behavior); it is impossible to understand how a colony functions without knowing how an agent functions (knowledge), and any departure from these fundamental norms changes the collective behavior (sensitivity). The GGO algorithm starts by creating a random population of individuals, each representing a potential fix for the problem. This population called a gaggle, has size n and is represented by the symbol Xi (i = 1, 2, …, n). Any individual is assessed using an objective function, Fn, of choice. The best solution, or leader, Xi, is found by computing the objective function of every individual (agent) and is indicated by P. Next, the population is dynamically divided into two categories through the GGO algorithm: an exploration group (n1) and an exploitation group (n2). Considering the best solution obtained, each iteration has different solutions in every set. 50% of the population is initially split equally between exploration and exploitation groups via the GGO algorithm. The numeral of agents in the exploitation group (n2) rises while the numeral of agents within the exploration group (n1) falls as the iterations continue. After three successive iterations, the optimal solution’s objective function value remains constant. The algorithm raises the number of agents within the exploration group to find a better solution and stay away from local optima in that scenario (n1).

Exploration operation

Exploration is responsible for finding intriguing sections of the search space and preventing local optimum stagnation by moving toward the optimal answer. Moving towards the best solution: using this strategy, the geese explorer will look for intriguing new places to explore near its present position. The exploration is performed by continually evaluating several potential neighboring possibilities to determine the most excellent fitness. For the A and C vectors adjusted as A = 2a.r1 − a and C = 2.r2 throughout iterations with the parameter altered linearly from 2 to 0. The GGO algorithm employs the formulae that follow to do this:

$${\mathbf{X}}\;(t + 1) = {\mathbf{X}}*\left( t \right) - {\mathbf{A}}.\left| {{\mathbf{C}}.{\mathbf{X}}*\left( t \right) - {\mathbf{X}}\left( t \right)} \right|$$
(3)

where (t) is an agent at iteration t. The (t) is the optimal solution (leader) position. The updated position of the agent is (t + 1). The r1 and r2 values change arbitrarily within the range of [0,1]. The formula that follows is utilized to assist in choosing three random search agents (paddlings), termed Paddle1, XPaddle2, and XPaddle3, to push agents not to be affected by one leader position to gain greater exploration. The current search agent’s location will be adjusted to correspond for ||≥ 1.

$${\mathbf{X}}(t + { 1}) \, = w_{{1}} * {\mathbf{X}}_{paddle\;1} + {\mathbf{z}} * w_{{2}} * ({\mathbf{X}}_{{paddle\;{2}}} - {\mathbf{X}}_{{paddle\;{3}}} ) \, + \, ({1 } - {\mathbf{z}}) * w_{{3}} * ({\mathbf{X}} - {\mathbf{X}}_{paddle\;1} )$$
(4)

where [0, 2] is where the values of w1, w2, and w 3 are updated. The formula that follows is used to calculate the parameter z, which is decreasing exponentially.

$${\text{z }} = { 1 } - \, (t/t_{\max } )^{{2}}$$
(5)

where t is the iteration numeral and tmax is the maximum numeral of iterations. For r3 ≥ 0.5, the second updating procedure, in which the values of the a and A vectors are reduced, is as follows.

$${\mathbf{X}}(t + { 1}) \, = w_{{4}} * |{\mathbf{X}}^{ * } (t) \, - {\mathbf{X}}(t)|.e^{bl} .{\text{Cos}} ({2}\pi l) \, + \, [{2}w_{{1}} (r_{{4}} + r_{{5}} )] * {\mathbf{X}}^{ * } (t)$$
(6)

where l is a random value in [− 1, 1] and b is a constant. While r4 and r5 are updating in [0, 1], the w4 parameter is updating in [0, 2].

Exploitation operation

The task of enhancing the current solutions falls to the exploitation team. At the end of each cycle, the GGO determines who is the most fit and gives them the appropriate prize. The GGO uses two distinct tactics to accomplish its exploitation goal, which are explained below. Moving in the direction of the best solution: The optimal solution is reached by using the subsequent formula. The three solutions (sentries), XSentry1, XSentry2, and XSentry3, direct other individuals (XNonSenttry) to adjust their positions in anticipation of the prey’s predicted position. The subsequent formulas illustrate the position update procedure.

$$\begin{gathered} {\mathbf{X}}_{1} = {\mathbf{X}}_{senstry\;1} - {\mathbf{A}}_{1} .|{\mathbf{C}}_{1} .{\mathbf{X}}_{senstry\;1} - {\mathbf{X}}| \hfill \\ {\mathbf{X}}_{2} = {\mathbf{X}}_{senstry\;2} - {\mathbf{A}}_{2} .|{\mathbf{C}}{1}.{\mathbf{X}}_{senstry\;2} - {\mathbf{X}}| \hfill \\ {\mathbf{X}}_{3} = {\mathbf{X}}_{senstry\;3} - {\mathbf{A}}_{3} .|{\mathbf{C}}{1}.{\mathbf{X}}_{senstry\;3} - {\mathbf{X}}| \hfill \\ \end{gathered}$$
(7)

where A = 2a is used to derive A1, A2, and A3. C = 2r2 is used to determine r1 − a and C1, C2, and C3.

Searching the area around the optimal solution

When flying, the most promising option is situated near the best answer (leader). This leads certain individuals to look for improvements by exploring areas near the optimal response, called XFlock1. The following equation is used by the GGO to carry out the previously indicated procedure.

$${\mathbf{X}}(t\, + \,{1})\, = \,{\mathbf{X}}(t)\, + \,{\mathbf{D}}({1}\, + \,{\mathbf{z}})\, * \,w\, * \,({\mathbf{X}}\, - \,{\mathbf{X}}_{Flock\;1} )$$
(8)

Selection of the best solution

The GGO has outstanding exploration capabilities since it utilizes a mutation approach and scans members within the exploration category. The GGO’s powerful exploring capability allow it to defer convergence. The GGO pseudo-code is observable and can be found in algorithm 1. We first supply population size, mutation rate, and number of iterations to GGO. The GGO then divides the participants to two groups: those that engage in exploitative labor and those who engage in exploratory work. Throughout the iterative process of identifying the optimal solution, the GGO approach adjusts each group’s size dynamically. Every team uses two methods to complete its duties. The GGO arbitrarily rearranges the responses among iterations to offer diversity and in-depth study. A component of the solution from exploration group may move to exploitation group in a single iteration as seen below. The GGO’s elitism method ensures that the leader remains in place along the operation. Figure 6 depicts each stage of the GGO algorithm utilized to update the locations to the exploration group (n1) and exploitation group (n2). The parameter r1 is adjusted throughout iterations, as expressed in Eq. (9).

$${\text{r}}_{1} = {\text{c }}\left( {1 - {\raise0.7ex\hbox{${\text{t}}$} \!\mathord{\left/ {\vphantom {{\text{t}} {t_{max} { }}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${t_{max} { }}$}}} \right)$$
(9)

where c represents a constant, t denotes the current iteration, and tmax represents the number of iterations. GGO updates the agents in the search space at the end of each iteration, and their positions in the exploration and exploitation groups are switched around as random. GGO gives back the optimal solution in the last stage.

Fig. 6
figure 6

Algorithm’s steps: exploration, exploitation, and dynamic groups.

Binary GGO algorithm

Feature selection is one of the most important steps in analyzing data, as feature selection aims to minimize the data’s high dimensionality by removing irrelevant or redundant information. They have, therefore, been applied in a range of fields as the fundamental goal of this feature selection optimization technique is to identify important characteristics that minimize classification errors. A minimized optimization problem is a mathematical description of feature selection. The GGO algorithm’s results will be solely binary, with values of 0 or 1, If there are any issues with feature selection. To facilitate the process of selecting features within the dataset, the suggested GGO method’s continuous values will be transformed into binary values [0, 1], as shown in the phases of Algorithm 2.

figure a

Algorithm 1: GGO Algorithm

figure b

Algorithm 2: bGGO Algorithm

The Eq. (10) used in this study is based on the Sigmoid function and is represented as follows:

$$\begin{gathered} {\text{x}}_{{\text{ d }}}^{{{\text{t}} + 1{ }}} = \left\{ {\begin{array}{*{20}c} {1 if Sigmoid\left( m \right) \ge 0.5} \\ {0 otherwise, } \\ \end{array} } \right. \hfill \\ {\text{Sigmoid }}\left( {\text{m}} \right) = \frac{1}{{1 + {\varvec{e}}^{{ - 10\left( {{\varvec{m}} - 0.5} \right)}} }} \hfill \\ \end{gathered}$$
(10)

where \({\text{x}}_{\text{ d }}^{\text{t}+1}\) denoted the binary solution at iteration t and dimension d. The Sigmoid function is scaling the resultant solutions to binary ones. The value will vary to 1 if Sigmoid(m) exceeds 0.5. Alternatively, it will stay 0. The m parameter reflects the features selected by the algorithm.

Algorithm 2 provides a full explanation of the binary GGO method. The GGO algorithm has a computing complexity of O (tmax × n) and will be O (tmax × n × d) for the d dimension. The binary GGO algorithm uses the objective equation Fn to evaluate the quality of a solution. The following Eq. (11) formula represents the classifier’s error rate, Err, using Fn.

$$F_{n} = \alpha Err + \beta \frac{\left| s \right| }{{\left| S \right|}}$$
(11)

where s denotes a set of the selected feature, while S rep\/ + resents a set of missing features, β = 1—α and α  [0, 1] indicates the population relevance of the specified trait. The strategy is successful if it can offer a subset of features with a minimal rate of errors in categorization. The only factor in classifier selection is the shortest path between the training and query instances.

Image classification

Finally, a categorization approach is applied to the extracted and refined collection of features. Machine learning and deep learning classifiers are utilized for sorting KOA images, specifically, convolutional neural network (CNN), decision tree (DT), K-nearest neighbor (K-NN), and multi-layer perceptron (MLP) classifiers.

Evaluation criteria

Performance metrics to the pre-trained model and classifier

Using the confusion matrix, measurements like accuracy, precision, recall, and F1-score can be calculated by comparing expected labels against true ones. The confusion matrix consists of four categories: True Positive (TP) value, True Negative (TN) value, False Positive (FP) value, and False Negative (FN) value. When the actual and anticipated classes are knee OA, a TP accurately forecasted and signified the case. TN relates to situations where the actual and projected classes do not include knee OA. FP occurs when the anticipated class is knee osteoarthritis. However, the actual class is different. FN is cases where knee OA is the actual class but the projected class differs. The most reliable method for detecting and classifying cases of osteoarthritis in the knee was determined to be the model that performs the best. Table 5 describes the evaluation metrics.

Table 5 The description of the utilized evaluation metrics.

Performance metrics to the optimizers

The following metrics are used in experiments to assess how well the suggested algorithm selects features for assessment (see Table 6). If M denotes the number of repetitions, g denotes the best solution, and N indicates the overall numeral of points, the ideal answer is g . L represents a point’s class, C represents the classifier’s output, and M atc indicates the degree of matching between the two inputs. \({g}_{j}^{*}\) denotes vector size, while D represents dataset size.

Table 6 The evaluation metrics used in experiments to evaluate how well the suggested optimizers select features for assessment.

Feature extraction results

Metrics including F1-score, N-Value, P-Value, sensitivity, and accuracy are employed to assess extracted features’ efficacy. Suppose the extracted features exhibit superior precision, sensitivity, specificity, F1-score, and a low P-value. In that case, the extraction method succeeded in identifying the most significant features of the categorization task (see Table 7). The study’s feature extraction technique used the ResNet-50 deep learning model, which produced an accuracy of 88.60%. The findings shown in the table show that the feature retrieved using ResNet-50 outperforms other deep neural networks. As a result, the suggested methodology’s subsequent phases use such a network. This level of performance indicates that ResNet-50 can ideally select and include the most valuable features from the provided dataset, which is an essential capacity for addressing the image classification problem.

Table 7 Assessing the characteristics that were derived with CNN deep neural networks.

This method’s feature extraction of ResNet-50 suggests that further optimization and extension into other domains could yield substantially greater success in the future. As a performance indicator, this shows how deep learning technology is developing and how well-suited it is to handle different challenging issues. Thus, future directions for technological progress in machine learning and artificial intelligence require that models like ResNet-50 be essential for obtaining improved outcomes across various domains.

Feature selection results

Feature selection strategies are employed to refine the gathered features after the feature extraction procedure. A range of metrics is employed to assess the selected features’ effectiveness, including best-fitness, worst-fitness, average error, average fitness, average select size, and standard deviation fitness. When evaluating the outcomes of selected features, best fitness, worst fitness, average error, average fitness, average select size, and standard deviation fitness can be used to gauge quality, complexity, stability, robustness, and possession of insightful information on the classification technique’s efficiency. The results of the criteria for evaluation rely on the suggested feature selection strategy are shown in Table 8, along with a comparison to the other approaches: binary Greylag Goose Optimization (bGGO), binary Firefly Algorithm (bFA), binary Satin Bowerbird Optimizer (bSBO), binary Grey Wolf Optimization (bGWO), binary Particle Swarm Optimization (bPSO), binary Bat Algorithm (bBA), binary Genetic Algorithm (bGA), binary Multi-verse Optimization (bMVO), and binary Whale Optimization Algorithm (bWOA). It is evident from the outcomes obtained that the suggested feature selection strategy is superior to any feature selection techniques found in related works. The outcomes demonstrate the superior performance and efficacy of the suggested approach for identifying the necessary feature set required to categorize KOA cases.

Table 8 Evaluation of the suggested optimization algorithm against alternative optimization algorithms for the chosen set of features.

Classification results and discussion

Various classifiers are utilized in this study, such as K-nearest neighbor (K-NN), a decision tree (DT), Multi-layer Perceptron (MLP) and convolutional neural network (CNN) classifiers. Several metrics, including time, F1-score, N-value, P-value, sensitivity, and specificity, can be employed to evaluate the effectiveness of the optimized classifiers. According to these measures, if the selected features are perceptive and can reliably differentiate between the different KOA picture classes, then the optimized classifiers can achieve high classification performance. The categorization results before and after selecting a feature are shown in Table 9. This table makes it clear that the classification outcomes with the suggested feature selection outperform the classification with the previous feature selection.

Table 9 The classification outcomes that were attained both using and without using the suggested feature selection technique.

First, convolutional neural networks (CNN) classifiers are used. Table 10 shows the results obtained utilizing the suggested strategy and alternative ways of optimizing CNN using various optimizers. The GGO-CNN model outperformed other cutting-edge classifier models built with the CNN approach, as evidenced by its accuracy of 0.988692. With a 0.974479 accuracy, the GWO-CNN-based approach yielded the second-best classification results. It was followed with PSO-CNN-based approach, which scored 0.969067; the WOA-CNN-based model, which achieved a score 0.96545; and the BBO-CNN-based approach, which produced the least accurate outcomes, with a 0.9425 accuracy.

Table 10 The Classification outcomes for various optimization algorithms based on CNN.

The chosen features are fed into the optimized classifiers since the outcomes of applying the suggested feature selection approach are promising. Figure 7 depicts the optimized classifiers-CNN-based model’s outcomes after being fed the desired feature. The attained accuracy is evaluated and displayed within this figure plot. The suggested methodology achieves an accuracy of 98.8692%, which is more accurate than the results of optimizing the CNN utilizing various optimization approaches. Table 11 presents the suggested system’s classification outcomes using K-nearest neighbor (K-NN), a decision tree (DT), Multi-layer Perceptron (MLP) and optimized-convolutional neural network (CNN) model parameters. Figure 8 depicts box plots of model metrics for suggested and contrasted algorithms. Figure 9 depicts a pair plot of metrics.

Fig. 7
figure 7

The results obtained by the CNN-based classifier as compared to the other optimization techniques when optimized with the proposed bGGO algorithm (a) Accuracy, (b) Sensitivity, (c) Specificity, (d) N-value, (e) P-value, and (f) F1-score.

Table 11 The categorization findings for the suggested system using K-nearest neighbor (K-NN), a decision tree (DT), Multi-layer Perceptron (MLP) and parameter optimization for the convolutional neural network (CNN) model.
Fig. 8
figure 8

Box plots for model metrics to the suggested and compared algorithms.

Fig. 9
figure 9

Pair plot of metrics.

Table 12 illustrates the ANOVA test findings for the offered bGGO + CNN approach against the comparable procedures. The ANOVA tests confirmed the bGGO + CNN procedure’s efficacy.

Table 12 The findings of the ANOVA for the proposed bGGO technique to categorize KOA.

The suggested technique was compared to recent related works investigations, as seen in Table 13, and it was found to outperform the current studies despite its complex structure and use of multiple approaches.

Table 13 Comparisons between the proposed solution in this paper and related works.

Conclusion

A deep learning technique has been presented in this paper to classify knee joint osteoarthritis automatically. KOA categorization was performed using a unique bGGO optimization algorithm based on CNN. The appropriate collection of features is obtained using deep learning and a transfer learning approach. The most prevalent features are taken from the dataset’s photos using various DL pre-trained models, particularly ResNet-50. The collected features were then optimized to minimize their number by Greylag Goose Optimization (bGGO) in binary form to increase accuracy and remove unnecessary features. After applying various classifiers and optimization algorithms to the features that GGO had chosen, classification metrics were computed; the suggested methodology attained an accuracy of 0.988692, a sensitivity of 0.980156, and a specificity of 0.990089. In contrast to similar work, the simulated outcomes outperform those. These findings support the suggested system’s use as an effective diagnostic tool for the early detection of KOA. On the other hand, a statistical analysis was carried out to demonstrate the validity of the suggested framework.