TL-PneuNet: a transfer learning-based pneumonia classification framework

Tripathy, Biswajit; Khan, Shakir; Bebortta, Sujit; Kamal, Ashraf; Tripathy, Subhranshu Sekhar; Fazil, Mohd; Albarrak, Abdullah M.

doi:10.1038/s41598-025-24180-8

Download PDF

Article
Open access
Published: 17 November 2025

TL-PneuNet: a transfer learning-based pneumonia classification framework

Biswajit Tripathy¹,
Shakir Khan²,
Sujit Bebortta¹,
Ashraf Kamal³,
Subhranshu Sekhar Tripathy⁴,
Mohd Fazil⁵ &
…
Abdullah M. Albarrak⁵

Scientific Reports volume 15, Article number: 40307 (2025) Cite this article

1645 Accesses
Metrics details

Subjects

This article has been updated

Abstract

Pneumonia is a severe respiratory ailment that may be caused by viruses, fungus, and bacteria. Pneumonia causes the accumulation of water, purulent material, or other fluids in the air sacs (alveoli) of the lungs. A delay in the identification of pneumonia may be life-threatening. Nevertheless, the limited radiation levels used for diagnosis result in unreliable detection, posing a significant obstacle in the field of pneumonia detection in healthcare. Transfer Learning (TL) may revolutionize healthcare by offering an effective method for differentiating between normal and pneumonia patients. Therefore, we have proposed a TL utilizing model for predicting pneumonia. Our experiment uses 5856 highly imbalanced chest X-ray images to fit different vision models such as Xception, VGG16, and ResNet152V2 using TL approach. After training, our model performs exceptionally well on the X-ray dataset, achieving an accuracy of 80.45, 80.77, and 83.17% respectively. However, the results show that the ResNet152V2 performs exceptionally well as compared to other models. Also, it achieves a precision and recall score of 79.87 and 97.69% respectively. The results exemplify the potential of our framework to help pulmonologists and physicians make rapid diagnoses of pneumonia patients by providing accurate and fast pneumonia classification.

Pneumonia detection from enhanced chest X-Ray images based on Double SGAN model

Article Open access 19 February 2026

Enhancing pediatric pneumonia diagnosis through masked autoencoders

Article Open access 14 March 2024

Efficient pneumonia detection using Vision Transformers on chest X-rays

Article Open access 30 January 2024

Introduction

Respiratory infections remain a major clinical concern globally. Any difficulty with the lungs disrupts the regular process of breathing, and the individual may face a range of severity from minor to life-threatening consequences. Since these organs are vital for the healthy functioning of the human body, any issue in the lungs must be detected early, as it prevents patients from developing life-threatening diseases^1,2. Pneumonia is one of the life-threatening respiratory infections, which affects a certain section of a population, especially children and elderly individuals with weak immune systems². To this end, artificial intelligence (AI) may play an important role in the early and precise identification of pneumonia cases along with the analysis of chest X-rays³. Several advanced AI models based on deep learning (DL) and transfer learning (TL) have emerged as a promising tool in the healthcare sector⁴. These models are capable of identifying specific patterns and abnormalities that are indicative of pneumonia. By leveraging deep convolutional neural networks (DCNN) with multiple layers, healthcare professionals can achieve precise and rapid identification of pneumonia, decide the line of treatment, and improve the diagnostic process⁴. Moreover, integration of these technologies into the healthcare sector has the potential to automate diagnostics, reduce errors, and provide a cost-efficient framework.

UNICEF reports that pneumonia claims the life of a child in every 43 second ⁵. The research states that pneumonia is the leading cause of mortality among children, accounting for 700,000 deaths each year (or around 2,000 per day, including 190,000 infants). That works out to one case of pneumonia in every seventy-one kids. About 18 million more healthcare personnel will be required by 2030 to identify, prevent, and treat pneumonia. Individuals who are susceptible to pneumonia also include those who are 65 years of age or older and individuals with prior medical conditions especially taking chemotherapy.

Chest X-ray is a frequently used diagnostic tool for identifying pneumonia^6,7,8. For example, computer-aided detection and diagnosis (CAD) systems can assist radiologists by improving diagnostic accuracy¹. In this view, Computed Tomography (CT) scans are very effective in identifying pneumonia, in addition to chest X-rays, sputum cultures and biopsies^9,10. CT scans use higher levels of radiation and are associated with higher costs¹⁰. The benefits of CT scans in identifying pneumonia must be considered with the hazards of excessive radiation exposure. An increased risk of cancer is associated with longer-term usage of higher-dose radiation or more CT scans. Sputum cultures are effective in identifying the specific microorganisms, but delay in culture findings may prolong therapeutic intervention. Biopsies or bronchoscopy is used for accurate pathogen and inflammatory pattern detection allows individualized treatment regimens for better patient outcomes, but lung biopsies are invasive and may cause infection and pneumothorax¹¹.

Nevertheless, the minimal radiation levels used for diagnostic purposes result in imprecise detection. Deep learning and transfer learning are essential in identifying patterns in images to accurately categorize them as either normal or abnormal. With each additional layer, the system identifies patterns that span the whole picture, including the boundaries, edges and individual dots. TL’s capacity to apply information from pre-trained models on big datasets to smaller datasets makes it promising for pneumonia detection in medical imaging. TL can be used to develop strong and effective pneumonia detection models on small annotated data by transferring learnt characteristics from datasets with copious annotated pictures. This strategy reduces the requirement for large labeled datasets, which are limited and expensive in medical imaging, and improves model generalization and performance by capturing complex data patterns and representations. Thus, TL provides a compelling answer to pneumonia diagnosis issues, improving healthcare outcomes and resource-efficient diagnostic operations.

The following is a layout of the sections used to detail the entire study. Section II provides a thorough overview of prior studies that have investigated the use of DL and TL techniques in the diagnosis of pneumonia. In Section III, we explained each method and dataset used in this study. In section IV, the experimental setup is described. Section V delves into the measures used to evaluate the method’s performance. Section VI presents the synthesis of experimental results about chest X-ray data for pneumonia. In the end, section VII concludes the manuscript.

Related work

When applied to the medical field, AI has the potential to revolutionize patient care by facilitating rapid and precise diagnosis of illness. To handle the differences within and across classes, Peng et al.¹² presented a multiscale residual network, where one channel receives the raw CT image, while the other receives the differential excitation component, allowing them to build a multi-scale residual network. They successfully manage inter-class variances by integrating multi-scale information from the CT scan picture. To further address intra-class variances, the model injects the multi-scale residual network with the CT scan image and differential excitation component. A remarkable 93.74% classification accuracy was attained by their method. The authors in¹ investigated the effectiveness of a Quaternion Residual Network trained on a publicly available large Chest X-Ray dataset from the Kaggle repository. The study reported a classification accuracy of 93.75% and an F-score of 0.94, highlighting the model’s potential in diagnosing chest-related conditions¹. A previous study examined the use of a hybrid deep learning model that combines quantum algorithms with neural networks to detect pneumonia from chest X-ray images. The proposed ZFNet-quantum neural network analyzed 5863 X-ray scans and achieved an accuracy of 96.5%, surpassing the traditional CNN model, which reached 94% accuracy, highlighting the potential of quantum-assisted deep learning in medical imaging².

Shi et al.¹³ proposed radiographic imaging to distinguish COVID from pneumonia, which used the attention-transfer-based categorization model. The foundation of this model is the distillation network architecture, in which students learn from their teachers. The teacher’s network function is built using attention that maps by focusing on critical areas and extracting global characteristics from images using the deformable attention module. Finally, after integrating an image fusion module with the instructor network’s attention information, the student network begins to collect input. A new DCNN and InceptionV3 model was presented by Rajeashwari and Arunesh¹⁴ to address the issue of accurate pneumonia prediction. Following that, they noticed that the model’s accuracy was relatively low. To raise it, they employed a Neighborhood Component Analysis (NCA) technique that relies on entropy normalization. They also proposed Ensemble-Modified Classifiers (EMC), which included Random Forest, Naïve Bayes, and XGBoost for the final classification, to further enhance the model’s accuracy to 98.5%.

In their study, Bhandari et al.¹⁵ focused on improving feature extraction to improve the model’s accuracy. They did it in two stages. First, they utilized convolutional feature extraction to find high-level object information. Next, they employed the Shapley Additive exPlanation (SHAP), Local Interpretable Model-agnostic Explanation (LIME), and Gradient-weighted Class Activation Mapping (Grad-CAM) approaches. This helped them achieve test and validation accuracies of 94.31 ± 1.01% and 94.54 ± 1.33, respectively, using tenfold cross-validation. They used explainable AI to further validate the model, .

When CNN is trained on an unbalanced, overfitted, or noisy dataset, problems such as significant variations in training and validation occur. After preprocessing the input image, a hierarchical template-matching algorithm is applied to reduce noise and improve features, Kaya¹⁶ suggested an ideal feature fusion-based ensemble CNN model. The presented model used a pre-defined CNN with initial weights assigned using different vision models utilizing TL along with majority voting. Following these procedures, the model achieved a 98.94% accuracy and a 99.12% F1 score on a publicly accessible dataset.

To diagnose COVID-19 using an X-ray dataset and further distinguish it from pneumonia, Mohagheghi et al.¹⁷ suggested a DCNN model. They employed image retrieval techniques for distinction and a DCNN model for diagnosis. Also, they improved the accuracy using transfer learning and hashing methods, which resulted in a 97% accuracy. In a comparative investigation, Avola et al.¹⁸ applied 12 unique pre-trained ImageNet models to distinguish between healthy individuals and patients with pneumonia. Due to dataset scarcity, they pooled two datasets into a single one consisting of 6,330 images. They then trained the model using 100%, 50%, 20%, and 10% of the dataset, respectively, and documented the results.

An experiment was conducted with a dataset on pediatric pneumonia. Prakash et al.¹⁹ presented a DCNN architecture-based channel attention to extract image features. Next, they performed principal component analysis (PCA) on each feature to reduce dimensionality, and finally, they added all the features. Additionally, TL-based designs such as Xception, ResNet152V2, DenseNet169, and ResNet50V2 utilize the channel attention mechanism. For pneumonia prediction, an ensemble-based stacking method is used using traditional ML models, including Nu-SVC, XGBClassifier, K-Nearest Neighbour, Support Vector Classifier, and Logistic Regression. With scores of 96.15% for accuracy, 97.19% for precision, 95.00% for recall, 96.49% for AUC-score, and 96.24% for precision, the stacking mechanism produces outstanding results across the board.

In³⁵ the authors use machine learning techniques to identify fibrosis, malignant lymph, metastases and normal lymphogram. Their comparative study demonstrates the capabilities of ML to enhance the accuracy of diagnostic tasks of complex medical imaging. In³⁶ the authors present CovXNet, a multi-dilation multi-receptive field-optimized convolutional neural network-based COVID-19 and pneumonia detector based on the chest X-ray dataset. The model is very accurate and flexible in a variety of datasets in early pandemic diagnostics. The work in³⁷ examined how the Internet of Medical Things (IoMT) can be used in healthcare monitoring and decision-making. It highlights the importance of inter-linked medical devices in enhancing patient care and distance healthcare provision. The authors in³⁸ suggest a hybrid deep neural network framework called RVCNet to diagnose lung diseases. Their findings demonstrate a better robustness of results and diagnostic accuracy of these compared to conventional deep learning models. The paper³⁹ used densely linked convolutional networks in the diagnosis of infectious diseases. The model offers more efficient and better representation of features by the process of rethinking with dense connections to enhance clinical diagnostics. Finally, authors in⁴⁰ predicts the COVID-19 and ICU needs using predictive modeling. It gives practical information about the resource planning of the hospital and pandemic management.

Proposed framework

The proposed framework is described in detail below, along with technical details of the TL used in²⁰. For a clear understanding, we will first understand the background details and then delve into the specifics of transfer learning techniques²¹. Our approach constructs a multi-step predictive tool that can accurately analyze the X-ray image and predict pneumonia with a faster response. Image data collection, data preparation and pre-processing, reconstruction of transfer learning architecture, and model refinement via the addition of layers like dense, batch normalization, and dropout layers are all part of the process^22,23. Figure 1 presents the design of the proposed paradigm. Below are the series of procedures used in the proposed approach:

Step 1: The analysis starts with a pneumonia dataset, which consists of 2 classes one is normal and another is pneumonia.

Step 2: During the pre-processing stage, to manage the uniformity we have taken the image size as (512, 512), To standardize the image we have used rescaling, for a clear visibility of the objects in the lungs we have adjusted the brightness of the images. For contrasting adjacent pixels, enhancing the visibility and definition of borders and features we have used sharpening filter.

Step 3: We have rebuilt the CNN model by adding some layer according to our need. After the architecture is ready for classifying the pneumonia, input the image augmentation right after the input layer.

Step 4: The layers like dropout, batch normalization, and dense layers are added to carry out the fine-tuning procedure. The purpose of this is to enhance the model for pneumonia classification.

Step 5: We have included three vision models, namely, Xception, VGG16, and ResNet152V2 for initial weight initialization and further image classification.

Step 6: After training of the models, we have compared the performance of all the transfer learning models and choose the best one using metrics like execution time, average memory utilization, accuracy, precision, recall, confusion matrix, and F1-score. Additionally, a thorough evaluation is conducted in relation to preexisting research.

Transfer learning framework

Xception: It is a convolutional neural network design based on the Inception model, uses a succession of depth-wise separable convolution layers with residual connections²⁴. With the exception of the initial and final blocks, which are connected via linear residuals, the 36 convolutional layers are organized into 14 blocks²⁵. These blocks make it famous for capturing picture characteristics at a fine granularity while keeping the computational cost down. The goal of this approach is to develop a parameter- and computationally-efficient model that represents data’s spatial and channel-wise relationships efficiently. Changing or understanding the architecture is a breeze. Just like VGG-16, it uses a high-level framework like Keras or TensorFlow Slim, which reduces the coding need to 30 to 40 lines. In comparison, advanced designs, such as Inception V2 or V3, are difficult to understand.
VGG16: VGG16, created in 2014 by the Visual Geometry Group (VGG) at Oxford University, is the name of one of the most recent dimensionality reduction algorithms²⁶. With its 16 weight layers of uniform simplicity, this architecture is composed of 13 convolutional layers and 3 fully linked layers. By using 3 × 3 filters with a stride of 1, zero-padding, and max-pooling layers, this VGG16 design effectively decreases spatial dimensions²⁴. Although VGG16 has a lot of parameters, it can still pull off impressive feature extractions. This architecture is ideal for use as a foundational model for picture categorization and object recognition due to its high efficacy in learning generic characteristics from photos.
ResNet152V2 Architecture: A novel deep neural network called ResNet was developed to address the disappearing gradient issue during training by allowing gradients to flow across the network²². With an astonishingly low error rate of just 3.57%, the ensemble achieved first place in the ILSVRC 2015 classification competition, proving beyond a reasonable doubt that this model is effective²⁷. There are many variants that use the same basic idea but with a different number of layers. An alternative known as Resnet50 may function with 50 neural network layers. To make the model more accurate, deep residual networks use residual blocks. “Skip connections” are the fundamental idea behind residual blocks with bottleneck architecture, which is a resilient neural network topology. On the ImageNet dataset, ResNet152V2, a variant of ResNet, beats all previous versions of ResNet. To improve the transfer of information between blocks, ResNet152V2 features an upgraded inter-block connection topology. Training stability and computational efficiency are both improved by this design compared to its predecessor, using its optimized residual connections and “bottleneck” building blocks. The ResNet152V2 framework usually takes a tensor representation of an image as its input. The size of the input picture and the number of channels determine the dimensions of this tensor.

ResNet152V2 is made up of several convolutional layers. The input picture is passed through a series of filters by each convolutional layer, which extracts features at various abstraction levels. A convolutional layer is mathematically expressed as²⁸:

$${H}_{i,j}^{l}= \delta \left(\sum_{n=1}^{N}\sum_{m=1}^{M}\sum_{k=1}^{K}{W}_{n,m,k}^{l}\times {X}_{i+n-1,j+m-1,k}^{l-1}+{b}_{i,j}^{l}\right)$$

(1)

In the above Eq. (1), δ denotes the activation function (such as ReLU), ${W}_{n,m,k}^{l}$ denote the convolutional filter’s weights, ${X}_{i+n-1,j+m-1,k}^{l-1}$ denotes the input activation at position $i+(n-1,j+m-1,k)$ in the preceding layer $l-1$, and ${b}_{i,j}^{l}$ indicates the bias term. ${H}_{i,j}^{l}$ is the output activation at position $\left(i,j\right)$ in layer $l$.

In order to facilitate the flow of gradients during training, ResNet topologies make use of residual blocks that include skip connections, also called identity mappings. The bottleneck block, with its three convolutional layers that use batch normalization and ReLU activations, is the fundamental building component of ResNet152V2. The mathematical representation of a residual block’s output is stated in Eq. (2) as below:

$$\text{Output}= F\left(x\right)+x$$

(2)

Hence, for every input $x$, $F$ stands for the block’s internal layers’ residual function.

$$F\left(x\right)= {w}_{3} \times \delta \left({w}_{2}\times \delta \left({w}_{1}*x+ {b}_{1}\right)+{b}_{2}\right)$$

(3)

As observed from Eq. (3), the weight matrices of the convolutional layers are denoted as ${(w}_{3},{w}_{2},{w}_{1}$), the bias terms are ${b}_{1},{b}_{2}$.

In the latter stages of the network, the feature maps’ spatial dimensions are reduced to a vector by using global average pooling. To explain it, global average pooling takes all of the feature maps and averages them as given in Eq. (4) below:

$${y}_{k}=\frac{1}{W\times H}\sum_{i=1}^{W}\sum_{j=1}^{H}{x}_{ij}^{k}$$

(4)

The activation at location $\left(i,j\right)$ in the ${k}^{th}$ feature map is represented as ${x}_{ij}^{k}$, while the feature maps’ height and breadth are denoted by $H$ and $W$, respectively.

The prediction for the classes is generated using a fully connected layer followed by a softmax activation. Logits $z$ given in Eq. (5) below is calculated mathematically in this layer by first applying a linear transformation to the input and then using the Eq. (6) for softmax function:

$$z= {W}_{ijc}\times y+{b}_{ijc}$$

(5)

$$\text{Softmax}\left({z}_{i}\right)=\frac{{e}^{{z}_{i}}}{\sum_{j=1}^{C}{e}^{{z}_{j}}}$$

(6)

where ${W}_{ijc}$ and ${b}_{ijc}$ are the weights and biases of the fully connected layer, respectively, and C is the number of classes. Exponentiation of logits is performed to make sure that the outputs are proper probability distributions with values between 0 and 1. The step-wise workflow of the proposed framework is depicted in Algorithm 1.

System model

The TL utilizing models are large and complex, having been pre-trained using a substantial amount of ImageNet data²⁸. To exercise such models, we need to preprocess the data and fine-tune the model so that it can perform efficiently for the intended task²⁹. Also, we need to reconstruct the output layer according to usage. There are two important procedures that we need to follow to use the TL:

Image augmentation

This is used in ML to modify or enrich images to expand the volume and diversity of training data¹⁴. Image augmentation is defined as the process of preparing image data for further processing by applying transformations such as rotation, cropping, shearing, flipping, scaling, and changing contrast or brightness to the original images³⁰. This process will focus on the key image portion and omit the unwanted image part. In the process, the initial step adds the image enhancement layer, ensuring that the augmentation layer will be an integral part of the design. This method enables the modified system structure to use the GPU’s increased processing capability by using preprocessed pictures to augment the device in tandem with the other layers. Additionally, the pretreatment layers are kept together with the remaining model when it is extracted. This approach eliminates the need to re-implement this functionality on the server by automatically standardizing images according to the layer settings.

Adding layers for our use

After Augmentation and image transformations, we have used the original TL model as used except for the last layer, where we implement the layer according to our usage³¹. Incorporating more layers is made possible by this selective retention, which optimizes the design’s efficacy and enhances predictions in pneumonia classification^32,33,34. However, overfitting may occur if there are too many additional layers.

To properly tune the model, we improved its architecture by adding layers tailored to the pneumonia dataset. Now the model is ready and can take in picture datasets and make predictions. We have included a portion of the dropout layer to prevent the model from overfitting. We are taking certain characteristics into account and ignoring the others by using the dropout layer. The improvements made to the suggested framework are detailed in the following sections: To integrate the basic model’s output, the Global Average Pooling2D Layer is used. Two Dense Layers, one with 512 neurons and the other with 64 neurons, are used in conjunction with Batch Normalization. Following the Global Average Pooling2D layer, we add a dense layer with 512 neurons, a dropout layer, and Batch Normalization. Relu is the activation function, and 1377 is the seed value for the kernel initializer, which is set to ‘glorot uniform’. By default, the bias initializer is set to zero. After that, we add another Batch Normalization layer. Next, we add a dense layer with 512 neurons and an activation function (we use ReLu) as well as batch normalization adjusted to ``Global Average Pooling2D” layer. Two neurons, one for each class, create the last dense layer for prediction. Because this is a multi-label classification task, the ‘softmax’ activation function is used. The bias initializer is set to ``zeros”, whereas the kernel initializer is set to “random uniform”.

We have taken into account the pre-trained trainable weights to utilize the learning strength of the current model. To calculate the step size at each iteration during training, the built architecture uses the ‘Adamax’ optimizer with a learning rate of 0.001. We use the ‘sparse categorical cross-entropy’ loss function, which finds the discrepancy between the expected probability distribution and the real category labels. To find the overall loss, it first determines the loss for each class individually in each iteration and then adds all losses together. Using a single integer instead of a whole vector to describe a class maximizes the efficiency of memory and computation.

Performance evaluation

To evaluate the performance of any ML or DL model there are basic evaluation matrices such as accuracy, F1-score, confusion matrix (CM), precision, recall, execution time, and average memory consumption. A brief description of these metrics is given below:

Confusion matrix

The CM is the basic fundamental technique matrix which is used to evaluate for haw many data the model predicts properly. It is a representation in tabular form that provide insights into four distinct combinations of actual and predicted values: True Positive (${{\varvec{t}}}_{{\varvec{P}}}$), False Positive (${{\varvec{f}}}_{{\varvec{P}}}$), True Negative (${{\varvec{t}}}_{{\varvec{N}}}$), and False Negative (${{\varvec{f}}}_{{\varvec{N}}}$). The CM, helps to evaluate the reliability of the model also from this we will be able to find the valuable information on accuracy, loss, recall, precision, and F1-score.

Accuracy

Accuracy is the first statistical measure for any ML models, which is the ratio of expected correct outcomes to actual observations. For the purpose of evaluating a model’s overall performance and dependability, the accuracy metric is used. The accuracy can be calculated as Eq. (7) below:

$${{\varvec{f}}}_{\mathbf{A}\mathbf{C}\mathbf{C}\mathbf{U}\mathbf{R}\mathbf{A}\mathbf{C}\mathbf{Y}}=\boldsymbol{ }\frac{{{\varvec{t}}}_{{\varvec{P}}}+{{\varvec{t}}}_{{\varvec{N}}}}{{{\varvec{t}}}_{{\varvec{P}}}+{{\varvec{f}}}_{{\varvec{P}}}+{{\varvec{f}}}_{{\varvec{N}}}+{{\varvec{t}}}_{{\varvec{N}}}}$$

(7)

Precision

For the purposes of predictive modelling, precision is the ratio of anticipated positive cases to actually predicted positive cases. This ratio technically shows the model’s ability to recognize positive findings reliably. The precision can be calculated using the expression in Eq. (8) as follows:

$${f}_{\text{PRECISION}}= \frac{{t}_{P}}{{t}_{P}+{f}_{P}}$$

(8)

Recall

To accurately identify all positive cases out of the total real positive instances by the model is called recall. It helps to identify proportion of actual predictions by the models. Models having high recall value indicates that the models don’t overlook real positive instances and it is minimizing the likelihood of false negatives. If the model having high recall value pulmonologists and clinicians rely on models for diagnose of the disease. Mathematically recall can be computed as Eq. (9):

$${f}_{\text{RECALL}}= \frac{{t}_{P}}{{t}_{P}+{f}_{N}}$$

(9)

F1-score

F1-score is another reliable statistic used to evaluate the performance of a model. It is computed by harmonically averaged of the accuracy and recall scores. By combining precision and recall, we may get a more complete picture of how effectively a model can detect positive examples with few false positives and negatives. The F1-score can be calculated as follows using Eq. (10):

$${f}_{f1-\text{score}}= 2\times \frac{{f}_{\text{RECALL}}\times {f}_{\text{PRECISION}}}{{f}_{\text{RECALL}}+{f}_{\text{PRECISION}}}$$

(10)

Execution time

If results cannot be utilized within the specified time range, they are useless. Thus, the model’s execution time can guarantee its practical usefulness. From initial data loading and training to final prediction or classification, the whole time required for a model to process and analyze the provided data is known as execution time. The execution time ${T}_{EX}$ can be computed by the difference between the end time ${T}_{E}$ and the start time ${T}_{E}$ provided in Eq. (11):

$${T}_{EX}=\left|{T}_{S}- {T}_{E}\right|$$

(11)

Average memory utilization

Average memory utilization is a measurement of a model’s efficiency in amount of memory resource consumption during its execution. It can be computed from Eq. (12) by dividing the total memory resource consumed ${Total}_{mem}$ by the total number of data points or instances processed $N$:

$${\text{Utilize}}_{\text{mem}}=\frac{{\text{Total}}_{\text{mem}}}{N}$$

(12)

Loss

In DCNN, we used stochastic gradient descent optimization algorithm for training and updating the weights for each node we used backpropagation. The model then learns using those weights and try to make the predictions and it compares with the actual value and then each times the weights are updated using backpropagation and the error is calculated. This error is called as loss and the formula for the loss can be obtained from Eq. (13) as:

$${\varvec{C}}{\varvec{C}}{\varvec{E}}=\boldsymbol{ }-\sum_{{\varvec{j}}=1}^{{\varvec{N}}{\varvec{C}}}{{\varvec{T}}}_{{\varvec{j}}}{\varvec{log}}\left({{\varvec{P}}}_{{\varvec{j}}}\right)$$

(13)

where NC is the number of classes and two distributions are, $true {(T}_{j})$ and predicted ${(P}_{j})$.

Result and discussions

Dataset description and pre-processing

The dataset used for the experiment contains 5,856 chest X-ray images (both anterior and posterior) of two categories, labelled as normal or pneumonia. The dataset was collected from Kaggle³². The dataset was originally collected from a cohort of pediatric patients of one to five years old at the Guangzhou Medical Center, Guangzhou, China. These X-ray were performed as the routine clinical trials of children. Not all of the images include only the chest, but some have a lot more than merely the upper or lower part of it. The images classify all the bacterial and virus-infected lungs as pneumonia. These images were screened for quality control and also verified by the expert physicians. A more detailed information about the data collection and preparation process can be found in the dataset paper by Daniel et al.³². The limited scope of the current study is also the restriction to only pediatric patients because the manifestation of diseases and physiological reactions may vary greatly among the age groups. Although the findings provide important insights into patterns in pediatric settings, it is still not clear whether they can be generalized to the adult and the elderly populations. Subsequent studies must thus focus on the use of high-quality and large-scale datasets that capture larger demographic cohorts and allow assessment of age-related variation and guarantee the generalizability and strength of the constructed models to the entire range of patients. Such extension would not only confirm the relevance of the existing approach but increase its clinical applicability in a variety of healthcare settings to a greater extent.

In the preprocessing phase, we limit the batch size to 16 to process only 16 images in each epoch. Then, we applied an image transformation before providing it to the model to enhance the accuracy. Rescaling of the X-ray images to the interval [0,255] ranges from black to white, which standardizes the pixel intensity values. To improve consistency of the model’s train and test phases, it’s important to standardize the intensity between [0,1]. One may simulate a broad range of lighting conditions, from very dark to extremely light, by adjusting the brightness of the X-ray images. For this study, we specified the brightness within the range of [0.9, 1.3], which accounts for the variations in brightness. Some of the image information can be lost due to compression, which can be resolved with a blurred lens or noise. We have also included a sharpening filter to restore the distorted information, which can improve model predictability.

Experimental set-up

The computer that was used for this study has a 2.10 GHz power of CPU, 16 GB of RAM, and is associated with the i7 12th gen Intel processor. The simulation was carried out using Python 3.10.9, with libraries Tensorflow, Keras, Pandas, Numpy, and Matplotlib used for preprocessing, implementing the suggested transfer learning method, and for performance evaluation. Figure 2 represents some sample Chest X-ray images for normal and pneumonia patient. All the vision models that we leveraged in our analysis; we compared their performance using the widely-used confusion matrix (CM) as depicted in Fig. 3. We have shown the CM for Xception, VGG16, and ResNet152V2 in Fig. 3a–c respectively. It is evident from CM that ResNet152V2 produced a higher number of correct predictions.

Table 1 shows performance parameters for three prominent TL models that we have used in our study: Xception, VGG16, and Resnet152V2. Reset152V2 earned the highest F1-score, accuracy, and precision as compared to VGG16 and Xception. ResNet152V2 is getting better than the others for accuracy, precision and f1-score matrix by showing 83.17, 79.87, and 80% respectively, but VGG16 is leading for recall which is of 99.23%. By observing the comparison, it turns out that Resnet152V2 has the impressive performance in all forms.

Table 1 Performance evaluation results for Xception, VGG16, and ResNet152V2.

Full size table

We measured the models’ performances after 50 epochs of training using the following metrics: training accuracy, precision, loss, recall, and F1-score. Knowing that, with the exception of loss, we tracked the rate of curve convergence to one for all of the evaluation matrices. A model is considered performing well if its curve either reaches one or approaches one. Figure 4a shows that ResNet152V2 converges to a local maximum faster than other models and even surpasses Xception and VGG16, which were evaluated considering training accuracy only. Regarding model assessment, the loss curve and the loss value are the most vital metrics, where loss is a numerical measure of the difference between the model’s predictions and the actual observations. As the number of epochs increases, the rate at which the ResNet152V2 loss value declines are faster than other models, as depicted in Fig. 4b. Such results confirm that the ResNet152V2 improves its predictive ability over time.

Precision and Recall curves is used to evaluate how accurately a model can identify or predict the true positives. From the Fig. 5a, b we have presented the comparison of precision and recall curve for Xception, VGG16, and ResNet152V2. From the both graph it is clearly shows that the ResNet152V2 converges to a local maximum more quickly than other models, and it even beats Xception and VGG16 with the growing number of epochs. Taking the F1-score into consideration was necessary due to the extreme imbalance in the image data that we have used for our study. The comparison of the F1-score for all the models is shown in Fig. 5c. The results demonstrate that ResNet152V2 outperforms Xception and VGG16 with respect to the F1-score.

After 50 epochs, we have measured the validation performance of each model using the following metrics: validation accuracy, precision, loss, recall, and f1-score. Knowing that, with the exception of loss, we need to track the rate of curve convergence to one for all of the evaluation matrices. We may say that a model is good if its curve either converges to one or is rapidly approaching towards one. Figure 6a shows that ResNet152V2 converges to a local maximum more quickly than other models, and it even surpasses Xception and VGG16, which were simply evaluated in terms of validation accuracy. The rate at which the ResNet152V2 method loss value declines is much faster comparing to the other models as the number of epochs increases as depicted in Fig. 6b. Such result confirms that the ResNet152V2 model is improving its predictive abilities over time.

From the Fig. 7a, b we have presented the comparison of validation precision and recall curve for Xception, VGG16, and ResNet152V2. From the both graph it is clearly shows that the ResNet152V2 converges to a local maxima more quickly than other models, and it even beats Xception and VGG16 with the growing number of epochs. The comparison of the validation f1-score for all the models is shown in Fig. 7c. The results demonstrate that ResNet152V2 outperforms Xception and VGG16 with respect to the F1-score.

After all these observations we have consider an overall precision, accuracy, recall, and F1-scores for Xception, VGG16, and ResNet152V2, which will give a clear idea about the performance of each and every models. In Fig. 8a It is observe that the ResNet152V2 outperform all others in terms of all the performance metrics that we have considered. The execution time for building the Xception, VGG16, and ResNet152V2 is shown in Fig. 8c. It can be observed that the Xception takes the lowest execution time, followed by the ResNet152V2 and VGG16 model over the considered chest X-ray dataset. In Fig. 8, the average memory utilization for Xception, VGG16, and ResNet152V2 is presented. From the Fig. 8b, it is clearly observed that the Xception model consume the minimal space, followed by the ResNet152V2 and VGG16 models. As the time and space consumed by the ResNet152V2 is quite high, we can consider some more preprocessing technique to reduce the space and time consumption by this model. Table 2 provides a detailed summary of the space and time values for each vision model.

Table 2 Comparison of Space And Execution Times For The Considered Models.

Full size table

The results of the pairwise t-test between AUC values of the three models, namely, Xception, VGG16, and ResNet152V2 are shown in Table 3 with a 95% confidence interval (CI). According to the results, there is statistical significance between the models. The comparison between Xception and VGG16 shows t-statistics of 10.9032 and p-value < 0.0001 to be the largest and the CI of the difference between the means was observed to be 0.0370 to 0.0557 does not intersect with zero, which indicates that Xception is significantly better than VGG16. The comparison between Xception and ResNet152V2 also shows a very high difference with a t-statistics of 4.4611, p-value of 0.0004, and the CI is positive, which lies between 0.0079 to 0.0264, indicating that Xception has higher AUC values than ResNet152V2. Furthermore, the mean difference between VGG16 and ResNet152V2 is negative with t-statistics of − 6.1725, and the p-value is 0.0001, also the CI is negative and lies between − 0.0352 and − 0.0232, meaning that ResNet152V2 is much better than VGG16.

Table 3 Comparison for different statistical values.

Full size table

Figure 9 provided below shows the ROC curves and the value of the AUC of the three considered transfer learning models namely Xception, VGG16, and ResNet152 V2 when applied to the same classification task. The ROC curve of the Xception model, with an AUC of 0.9700, is shown in sub Fig. 9a, which results in high discriminative performance and high ability to tell the difference between positive and negative classes. In Fig. 9b the VGG16 model with AUC of 0.9200 which corresponds to a good performance of classification, even though it is less than Xception. The trade-off between sensitivity and specificity has a moderate value. Figure 9c shows the ROC curve of ResNet152V2, the AUC of which is 0.9400, which is better than VGG16 but lesser than Xception, demonstrating that it has strong classification capacity.

Conclusion

Our study used Xception, VGG16, and ResNet152V2 to test the TL-based models for pneumonia patient classification. The presented model includes a preprocessing layer, TL-based framework , fine-tuning, and 5,856 chest X-ray datasets with significant imbalances. We evaluated the presented model using precision, recall, f1-score, accuracy, loss, confusion matrix, time, and space complexity. With an accuracy score of 83.17% for ResNet152V2, 80.77% for VGG16, and 80.45% for Xception, our tested models performed well. After analyzing performance and a thorough investigation, we found that ResNet152V2 performed better than the other methods and popular techniques. The experiment demonstrated the power of TL in tackling pneumonia detection difficulties and established that ResNet152V2 performed better than the other models. This contributes to better patient outcomes and more effective healthcare delivery by improving the early identification and classification of pneumonia patients. By reliably and quickly identifying cases of pneumonia, the ResNet152V2 model may help pulmonologists and doctors make correct diagnoses quickly. It also has the potential to properly categorize individuals as either normal or pneumonia patient.

Future works

Our study highlights the importance of using TL towards classification of Pneumonia using Chest X-ray dataset. However, only relying on a theoretical prediction framework may not resolve the issues associated with appropriate diagnosis and treatment plan for pneumonia, hence posing difficulty on healthcare providers. Thus, possible future research in this direction would be the design of an advanced healthcare monitoring platform which could keep track of the patients’ ailments and their relevant prescribed medications in conglomeration with healthcare professionals’ supervision. The accuracy of the transfer learning model can improve by using sophisticated strategies like attention mechanisms and adversarial training by concentrating the relevant regions of the chest X-ray instead of whole image. Furthermore, explainable AI techniques can be used to make deep learning model more transparent and easier to understand, which will give both doctors and patients greater faith in the diagnosis process. In light of ResNet152V2’s outstanding experimental performance, future work may focus on increasing the model’s optimization and fine-tuning, data preprocessing, and the inclusion of relevant regions within chest X-ray images in order to achieve even higher performance, improve diagnostic reliability, and reduce space and response time.

Data availability

The real datasets used in the current study was collected at the Guangzhou Medical Center, Guangzhou, China and available in the Kaggle repository. It was accessed using the following link: [https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia] (Accessed on 5th March 2025).

Change history

14 December 2025
The original online version of this Article was revised: The original version of this Article erroneously assigned Dr Shakir Khan Affiliation 5: College of Computer and information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia. Additionally, the original version of the Article erroneously assigned Dr Mohd Fazil and Dr Abdullah M. Albarrak Affiliation 2: University Centre for Research and Development, Chandigarh University, Mohali, India, when they are associated only with Affiliation 5: College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia.

References

Singh, S. & Tripathi, B. K. Pneumonia classification using quaternion deep learning. Multimedia Tools Appl. 81 (2), 1743–1764 (2022).
Article Google Scholar
Shahwar, T., Mallek, F., Rehman, A. U., Sadiq, M. T. & Hamam, H. Classification of pneumonia via a hybrid ZFNet-quantum neural network using a chest X-ray dataset. Curr. Med. Imaging. 20(1), e15734056317489 (2024).
Article PubMed Google Scholar
Chang, T. H. et al. Clinical characteristics of hospitalized children with community-acquired pneumonia and respiratory infections: Using machine learning approaches to support pathogen prediction at admission. J. Microbiol. Immunol. Infect.56(4), 772 – 81. (2023).
Liu, Y. N. et al. Infection and co-infection patterns of community-acquired pneumonia in patients of different ages in China from. to 2020: a national surveillance study. The Lancet Microbe. 4(5), e330-9. (2009).
https://data.unicef.org/topic/child-health/pneumonia/https://data.unicef.org/topic/child-health/pneumonia/
Ippolito, D. et al. Artificial Intelligence Applied to Chest X-ray: A reliable tool to assess the differential diagnosis of lung pneumonia in the emergency department. Diseases. 11(4), 171. (2023).
Sheu, R. K. et al. Interpretable classification of pneumonia infection using eXplainable AI (XAI-ICP). IEEE Access. 11, 28896–28919 (2023).
Article Google Scholar
Akbulut, Y. Automated pneumonia based lung diseases classification with robust technique based on a customized deep learning approach. Diagnostics. 13(2), 260. (2023).
Wang, Y. et al. Prediction of viral pneumonia based on machine learning models analyzing pulmonary inflammation index scores. Comput. Biol. Med. 169, 107905. (2024).
Wang, W., Zhao, X., Jia, Y. & Xu, J. The communication of artificial intelligence and deep learning in computer tomography image recognition of epidemic pulmonary infectious diseases. Plos One. 19 (2), e0297578 (2024).
Article CAS PubMed PubMed Central Google Scholar
Dash, A. K., Mohapatra, P. & Ray, N. K. Pneumonia detection in children from chest X-ray images by executing network surgery of deep neural networks. SN Comput. Sci.. 5(2), 1–3. (2024).
Peng, L. et al. Classification and quantification of emphysema using a multi-scale residual network. IEEE J. Biomed. Health Inform. 23(6), 2526–2536. (2019).
Shi, W., Tong, L., Zhu, Y. & Wang, M. D. COVID-19 automatic diagnosis with radiographic imaging: explainable attention transfer deep neural networks. IEEE J. Biomedical Health Inf. 25 (7), 2376–2387 (2021).
Article Google Scholar
Rajeashwari, S. & Arunesh, K. Enhancing pneumonia diagnosis with ensemble-modified classifier and transfer learning in deep-CNN based classification of chest radiographs. Biomed. Signal Process. Control. 93 ,106130. (2024).
Bhandari, M., Shahi, T. B., Siku, B. & Neupane, A. Explanatory classification of CXR images into COVID-19, pneumonia and tuberculosis using deep learning and XAI. Comput. Biol. Med. 150, 106156 (2022).
Article PubMed PubMed Central Google Scholar
Kaya, M. Feature fusion-based ensemble CNN learning optimization for automated detection of pediatric pneumonia. Biomed. Signal Process. Control. 87, 105472 (2024).
Article Google Scholar
Mohagheghi, S. et al. of CNN, CBMIR, and visualization techniques for diagnosis and quantification of covid-19 disease. IEEE J. Biomed. Health Inform. 25(6), 1873-80. (2021).
Avola, D. et al. Study on transfer learning capabilities for pneumonia classification in chest-X-rays images. Comput. Methods Program Biomed. 221, 106833. (2022).
Asswin, C. R. et al. Transfer learning approach for pediatric pneumonia diagnosis using channel attention deep CNN architectures. Eng. Appl. Artif. Intell. 123, 106416. (2023).
Ansari, N., Faizabadi, A., Motakabber, S. & Ibrahimy, M. Effective pneumonia detection using res net based transfer learning. Test Eng. Manag. 82, 15146-53. (2020).
Madhavan, M. V. et al. Res-CovNet: an internet of medical health things driven COVID-19 framework using transfer learning. Neural Comput. Appl. 35 (19), 13907–13920 (2023).
Article PubMed Google Scholar
Kumar, N., Gupta, M., Gupta, D. & Tiwari, S. Novel deep transfer learning model for COVID-19 patient detection using X-ray chest images. J. Ambient Intell. Humaniz. Comput. 14 (1), 469–478 (2023).
Article CAS PubMed Google Scholar
Pham, T. A., Hoang, V. D. & Chest X-ray image classification using transfer learning and hyperparameter customization for lung disease diagnosis. J. Inform. Telecommun. 5, 1–5. (2024).
Luján-García, C., Villuendas-Rey, Y. & Camacho-Nieto, O. A transfer learning method for pneumonia classification and visualization. Appl. Sci. 10(8), 2908. (2020).
HashmiMF, Katiyar, S., Keskar, A. G., Bokde, N. D. & Geem, Z. W. Efficient pneumonia detection in chest Xray images using deep transfer learning. Diagnostics 10 (6), 417 (2020).
Article PubMed PubMed Central Google Scholar
Jain, R., Nagrath, P., Kataria, G., Kaushik, V. S. & Hemanth, D. J. Pneumonia detection in chest X-ray images using convolutional neural networks and transfer learning. Measurement 165, 108046 (2020).
Article Google Scholar
Elshennawy, N. M. & Ibrahim, D. M. Deep-pneumonia framework using deep learning models based on chest X-ray images. Diagnostics. 10(9), 649. (2020).
Mohamed, C., Mwangi, R. W. & Kihoro, J. M. Enhancing Pneumonia Detection in Pediatric Chest X-Rays Using CGAN-Augmented Datasets and Lightweight Deep Transfer Learning Models. J. Data Anal. Inform. Process. 12(1), 1–23. (2024).
Koul, A., Bawa, R. K. & Kumar, Y. An analysis of deep transfer learning-based approaches for prediction and prognosis of multiple respiratory diseases using pulmonary images. Arch. Comput. Methods Eng. 31(2), 1023-49. (2024).
Gu, C. & Lee, M. D. Transfer learning using real-world image features for medical image classification, with a case study on pneumonia X-ray images. Bioengineering. 11(4), 406. (2024).
Yoon, T. & Kang, D. Enhancing pediatric pneumonia diagnosis through masked autoencoders. Sci. Rep. 14 (1), 150 (2024).
ADS Google Scholar
Daniel, S. K., Michael, G., Wenjia, C., Anthony, L. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 172(5), 1122–1131 (2018).
Abou-Kreisha, M. T. et al. ElDahshan. Multisource smart computer-aided system for mining COVID-19 infection data. Healthcare. 10, 109 (2022).
Fathy, K. A., Humam, K., Yaseen, Mohammad, T., Abou-Kreisha & Kamal, A. ElDahshan. A novel meta-heuristic optimization algorithm in white blood cells classification. Comput. Mater. Continua. 75, 1527–1545. (2023).
Bharati, S., Robel, M. R. A. & Podder, M. A. and Niketa Gandhi. Comparative performance exploration and prediction of fibrosis, malign lymph, metastases, normal lymphogram using machine learning method. In: International Conference on Innovations in Bio-Inspired Computing and Applications, pp. 66–77. Cham: Springer International Publishing, (2019).
Mahmud, T. & Shaikh Anowarul, F. Md Awsafur Rahman, and CovXNet: A multi-dilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable multi-receptive feature optimization. Comput. Biol. Med. 122, 103869. (2020).
Mahmud, T. Applications for the Internet of medical things. Int. J. Sci. Res. Arch. 10, 1247–1254. (2023).
Alam, F., Binte, Podder, Rubaiyat Hossain, M. & Mondal Prajoyand RVCNet: A hybrid deep neural network framework for the diagnosis of lung diseases. Plos one. 18, e0293125. (2023).
Podder, Prajoy, F. B., Alam, M. R. H. & Mondal Md Junayed Hasan, Ali Rohan, and Subrato Bharati. Rethinking densely connected convolutional networks for diagnosing infectious diseases. Computers. 12, 95. (2023).
Podder, P., Khamparia, A. & Mondal, M. R. H. Mohammad Atikur Rahman, and Subrato Bharati. Forecasting the spread of COVID-19 and ICU Requirements. (2021).

Download references

Acknowledgements

This work was supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) (grant number IMSIU-DDRSP2501).

Funding

This work was supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) (grant number IMSIU-DDRSP2501).

Author information

Authors and Affiliations

Department of Computer Science, Ravenshaw University, Cuttack, 753003, India
Biswajit Tripathy & Sujit Bebortta
University Centre for Research and Development, Chandigarh University, Mohali, India
Shakir Khan
Privacy Machine Learning, PayPal, Chennai, India
Ashraf Kamal
School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, 751024, India
Subhranshu Sekhar Tripathy
College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
Mohd Fazil & Abdullah M. Albarrak

Authors

Biswajit Tripathy
View author publications
Search author on:PubMed Google Scholar
Shakir Khan
View author publications
Search author on:PubMed Google Scholar
Sujit Bebortta
View author publications
Search author on:PubMed Google Scholar
Ashraf Kamal
View author publications
Search author on:PubMed Google Scholar
Subhranshu Sekhar Tripathy
View author publications
Search author on:PubMed Google Scholar
Mohd Fazil
View author publications
Search author on:PubMed Google Scholar
Abdullah M. Albarrak
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, Biswajit Tripathy (B.T), Shakir Khan (S.K), Sujit Bebortta (S.B), Ashraf Kamal (A.K.), and Subhranshu Sekhar Tripathy (S.S.T.); Software, A.K. and A.M.A.; validation, B.T., S.B, S.S.T., Mohd Fazil (M.F.) data curation, S.K., S.B., A.K., writing—original draft preparation, B.T. , A.K. and A.M.A.; writing—review and editing, S.K., S.B, S.S.T., M.F.; visualization, B.T, S.B, S.S.T., A.K., S.K.; supervision, B.T.; project administration, S.K.; funding acquisition, S.K.,M.F.; All authors have read and agreed to the published this manuscript.

Corresponding authors

Correspondence to Ashraf Kamal or Mohd Fazil.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Tripathy, B., Khan, S., Bebortta, S. et al. TL-PneuNet: a transfer learning-based pneumonia classification framework. Sci Rep 15, 40307 (2025). https://doi.org/10.1038/s41598-025-24180-8

Download citation

Received: 03 August 2025
Accepted: 10 October 2025
Published: 17 November 2025
Version of record: 17 November 2025
DOI: https://doi.org/10.1038/s41598-025-24180-8

Subjects

Abstract

Similar content being viewed by others

Pneumonia detection from enhanced chest X-Ray images based on Double SGAN model

Enhancing pediatric pneumonia diagnosis through masked autoencoders

Efficient pneumonia detection using Vision Transformers on chest X-rays

Introduction

Related work

Proposed framework

Transfer learning framework

System model

Image augmentation

Adding layers for our use

Performance evaluation

Confusion matrix

Accuracy

Precision

Recall

F1-score

Execution time

Average memory utilization

Loss

Result and discussions

Dataset description and pre-processing

Experimental set-up

Conclusion

Future works

Data availability

Change history

14 December 2025

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links