An optimized EfficientNetB0 framework with CLAHE-based preprocessing for accurate multi-class chest X-ray classification

Hegazy, Nagwa Yaseen; Sawah, Mohamed S.

doi:10.1038/s41598-026-42492-1

Download PDF

Article
Open access
Published: 28 March 2026

An optimized EfficientNetB0 framework with CLAHE-based preprocessing for accurate multi-class chest X-ray classification

Nagwa Yaseen Hegazy¹ &
Mohamed S. Sawah^2,3

Scientific Reports volume 16, Article number: 10811 (2026) Cite this article

1292 Accesses
Metrics details

Subjects

Abstract

Chest radiography remains an essential diagnostic tool for thoracic diseases, yet interpreting overlapping anatomical structures is particularly challenging when multiple pathologies co-occur a common clinical scenario often oversimplified in deep learning approaches. This study presents an optimized EfficientNetB0 framework designed explicitly for multi-label classification of chest X-rays using the NIH dataset, integrating CLAHE-based contrast enhancement, strategic class balancing, and a comparative transfer learning strategy that preserves the dataset’s inherent multi-label complexity. The proposed model achieved superior diagnostic performance with a macro-average AUC of 0.906 and recall of 0.824, outperforming DenseNet121 and MobileNetV2, and demonstrated strong per-class discrimination, especially for Pneumonia (AUC = 0.950) and Cardiomegaly (AUC = 0.946). These results confirm that the framework effectively balances learning capacity and generalization in a realistic multi-label clinical setting, offering a robust, interpretable solution suitable for computer-aided diagnosis where accurate detection of co-occurring thoracic pathologies is critical.

Deep learning classification of active tuberculosis lung zones wise manifestations using chest X-rays: a multi label approach

Article Open access 17 January 2023

A 178-clinical-center experiment of integrating AI solutions for lung pathology diagnosis

Article Open access 20 January 2023

CXR-MultiTaskNet a unified deep learning framework for joint disease localization and classification in chest radiographs

Article Open access 31 August 2025

Introduction

A medical imaging and diagnostic technique that is both economical and user-friendly is chest radiography (chest X-ray or CXR). The technique is the most frequently employed diagnostic instrument in medical practice and plays a critical role in the diagnosis of lung disease.

Chest X-rays are employed by radiologists who have received adequate training to identify a variety of maladies, including pneumonia, tuberculosis, interstitial lung disease, and early lung cancer¹. The minimal cost and ease of operation of chest X-rays are among their greatest advantages. Modern digital radiography (DR) devices are exceedingly cost-effective, even in regions that are underdeveloped. As a result, chest radiographs are frequently employed to diagnose and detect lung diseases, including interstitial lung disease, tuberculosis, and pulmonary nodules. A significant quantity of information regarding a patient’s health is contained in chest radiography. Nevertheless, the physician consistently faces a significant obstacle in accurately interpreting the information. The interpretation is significantly complicated by the overlapping of the tissue structures in the thorax X-ray².

The first attempt to establish a computer-aided detection system was in the 1960s³, and studies have shown that the detection accuracy for the chest disease is improved with a X-ray CAD system as an assistant. Many commercial products have been developed for the clinical applications, including CAD4 TB, Riverain, and Delft imaging systems⁴. However, because of the complexity of the chest X-rays, the automaticdetection of the diseases remains unresolved, and most of the existing CAD systems are aimed at the early detection of the lung cancer. A relatively small number of studies are devoted to the automatic detection of the other types of the pathologies⁵. The analysis and interpretation of medical imaging are being transformed by artificial intelligence (AI), which is revolutionizing radiology. Over the past decade, the AI industry in radiology has experienced exponential growth, with the U.S. Food and Drug Administration (FDA) approving over 100 companies and nearly 400 algorithms. The transformative potential of AI in addressing diagnostic challenges, such as identifying subtle abnormalities or characterizing irregular structures, is reflected in this growth, which frequently exceeds human capabilities⁶.

The field of medical imaging has experienced a significant increase in the use of computer-aided diagnosis (CAD) as a result of the recent surge in deep learning techniques. Commercial AI solutions for chest radiographs, which are designed using deep learning (DL) algorithms, have garnered attention and demonstrated exceptional performance in the detection of malignant pulmonary nodules, tuberculosis, and other abnormalities in experimental datasets, among the numerous applications of artificial intelligence (AI) in diagnostic imaging^7,8,9.The AI solution’s diagnostic accuracy is superior to that of clinicians; however, experimentally collected datasets may have enriched disease prevalence, which may not be generalizable across disease domains. Consequently, in order to verify the AI solution’s efficacy in real-world clinical practice, cross-sectional studies should be implemented in carefully selected cohorts^10,11.

AI algorithms have demonstrated high sensitivity and negative predictive value in diagnosing pulmonary embolism on CT pulmonary angiograms, complementing radiologists’ expertise¹². For pulmonary nodule detection and classification, deep learning techniques have exhibited excellent performance, with one study reporting 94% sensitivity and 83% specificity for malignant nodule detection on chest radiographs¹³. AI applications extend to various lung conditions, including interstitial lung disease and chronic obstructive pulmonary disease¹⁴. Despite these advancements, challenges remain in implementing AI systems in clinical practice. The integration of AI is expected to enhance radiologists’ diagnostic confidence and efficiency rather than replace them¹⁵. Continued research and development in this field aim to improve early lung cancer detection and reduce associated morbidity and mortality.

In order to enhance the accuracy of chext X-ray classification, This study aims to develop an optimized deep learning framework for enhanced chest X-ray classification by integrating advanced preprocessing techniques and transfer learning with EfficientNetB0.

The primary objectives of this study are to: Systematically preprocess the publicly available chest X-ray dataset¹⁶ through greyscale conversion, CLAHE-based contrast enhancement, brightness adjustment, and strategic resizing (224 × 224), while addressing class imbalance via undersampling/oversampling. Implement rigorous data filtering by selecting target pathology classes and omitting multi-label images to ensure dataset integrity. Design and optimize a transfer learning pipeline using EfficientNetB0, comparing feature extraction versus fine-tuning approaches for classification efficacy. Comprehensively evaluate model performance through robust training/testing protocols and multi-metric validation to ensure clinical applicability.

The primary contributions of this work are:

(1)
We introduce a systematic, reproducible, and clinically-informed preprocessing pipeline specifically designed for chest X-ray images. This pipeline integrates grayscale preservation, optimized CLAHE-based contrast enhancement (clipLimit = 2.0, tileGridSize = 8 × 8), brightness adjustment, and standardized resizing. The parameters are selected based on radiological best practices to enhance subtle thoracic structures and standardize inputs for deep learning, while remaining adaptable to other medical imaging modalities.
(2)
We propose a rigorous data balancing strategy that preserves the NIH dataset’s inherent multi-label complexity, addressing severe class imbalance without discarding clinically relevant multi-label samples. Our per-class strategy employs strategic oversampling and undersampling, ensuring robust learning in a setting that reflects the real-world co-occurrence of thoracic pathologies.
(3)
We design and validate an enhanced EfficientNetB0 framework optimized for multi-label thoracic classification, integrating a custom Squeeze-Excitation attention block for improved feature recalibration and employing Focal Loss to handle class imbalance and label noise. The framework implements a two-phase transfer learning strategy (feature extraction followed by full fine-tuning) evaluated through comprehensive metrics, providing empirical guidance on optimal training protocols for multi-label medical image analysis.

The remainder of this paper is organized as follows: section “Related work” (Related Work) critically examines existing deep learning approaches for chest X-ray classification, emphasizing gaps in integrated preprocessing pipelines and transfer learning strategies. Section “Proposed methodology” (Methodology) details the data preprocessing workflow including greyscale conversion, CLAHE-based contrast enhancement, brightness adjustment, resizing (224 × 224), and class-balancing (undersampling/oversampling) followed by data filtering protocols for class selection and multi-label omission, and the EfficientNetB0 implementation for feature extraction and fine-tuning. Section “Results and discussion” (Results and Discussions) presents quantitative performance analysis across training, validation, and testing phases. Section “Conclusion” (Conclusion) synthesizes the contributions of the optimized framework, discusses clinical implications, and proposes future research directions for real-world deployment. Finally, Sect. “Limitations and future directions” describes the limitation and future directions.

Related work

In this literature section, we review key research on the application of deep learning to chest X-ray analysis. We highlight various approaches, models, and outcomes, which are also briefly summarized in Table 1. This review helps to identify gaps and opportunities for further investigation in our study.

A deep learning framework tailored for the multi-class diagnosis of lung diseases is ntroduced in¹⁷, including fibrosis, opacity, tuberculosis, viral pneumonia, COVID-19 pneumonia, and normal cases, using chest X-ray images. The framework leverages a custom convolutional neural network (CNN) architecture designed to extract discriminative features effectively. The study addresses challenges like dataset imbalance by employing data augmentation techniques. Extensive experiments demonstrate superior performance, achieving an accuracy of 98.88% and strong performance metrics such as an F1-score of 0.9887 and an AUC of 0.9939. This study highlights the potential of deep learning in enhancing diagnostic accuracy and efficiency in medical imaging, contributing significantly to automated lung disease detection and management.

The study in¹⁸ proposed a biphasic majority voting-based system for the automated diagnosis of COVID-19, normal, and pneumonia cases using chest X-ray images. Their method utilized six classifiers, selecting the five best-performing ones in two phases. Features were extracted with the Bag of Features method and classifiers like KNN, Linear Discriminant, Logistic Regression, and SVM. The approach achieved high accuracy rates of 99.86% (Phase-1) and 99.28% (Phase-2), with an overall accuracy of 99.63%. Notably, the system also demonstrated excellent performance across metrics like specificity, precision, recall, and F1-score. The study emphasized the importance of usability, integrating a graphical user interface (GUI) for accessibility by non-experts. Their results showed superior performance compared to similar models, highlighting the reliability of the biphasic majority voting technique.

A high-precision multiclass classification model is proposed in¹⁹, MobileLungNetV2, for diagnosing lung diseases using the ChestX-ray14 dataset. The model, fine-tuned from MobileNetV2, achieved 96.97% classification accuracy, outperforming other models like InceptionV3 and VGG19. Image pre-processing techniques such as CLAHE and Gaussian filtering were used to improve data quality. The model demonstrated high precision (96.71%), recall (96.83%), and specificity (99.78%), and utilized Grad-CAM for visualizing disease detection areas. The study highlights the effectiveness of MobileNetV2 in automated lung disease classification¹⁹.

The study in²⁰ proposed two deep learning approaches for classifying and localizing lung abnormalities, including COVID-19, on chest X-rays. The study utilized multi-classification and object detection models trained on a large chest X-ray dataset. By combining multiple object detection models, the approach outperformed single object models in both classification and localization tasks. The method achieved promising results and has the potential to assist radiologists in diagnosing chest X-ray abnormalities more accurately and efficiently, improving patient outcomes and reducing healthcare system burdens.

The CXR-LT challenge is presented in²¹, focusing on the long-tailed, multi-label disease classification problem in chest X-ray (CXR) imaging. Medical image recognition, particularly for chest radiography, is often long-tailed, with a few common findings overshadowed by many rare conditions. The challenge addressed both label imbalance and co-occurrence of multiple diseases in patients. They released a large-scale dataset of over 350,000 CXRs with 26 clinical findings, emphasizing the need for specialized techniques to tackle these challenges. The study also discussed top-performing solutions and proposed using vision-language foundation models for few and zero-shot disease classification, offering practical recommendations for future research in this area.

An optimized ensemble framework for multi-label classification on long-tailed chest X-ray data is introduced in²², addressing the challenge of diagnosing multiple diseases from chest X-ray images. The paper highlights the complexities of multi-label classification in medical imaging, where patients often present with multiple overlapping diseases. This challenge is compounded by the long-tailed distribution of diseases, where common conditions are overshadowed by rare ones, leading to biased predictions. The authors focus on the MIMIC-CXR-LT dataset, which is designed to address these issues in multi-label long-tailed classification. Their optimized ensemble approach, which involves experimentation with architecture design and data augmentation, improved classification performance on imbalanced medical images. The proposed framework ranked highly in the CXR-LT competition, demonstrating its effectiveness in tackling the long-tailed distribution and multi-label classification problems in medical imaging.

An Artificial Intelligence (AI)-based classification system is proposed in²³ to differentiate between COVID-19 and other infectious diseases using chest X-ray images. The study addresses the global health crisis caused by the COVID-19 pandemic, focusing on the need for faster diagnostic methods to supplement or replace RT-PCR tests. The authors utilized publicly available PA chest X-ray images of adult COVID-19 patients to train deep learning models for rapid screening. To enhance the dataset and improve model generalization, they performed 25 types of image augmentation. The models were trained using a transfer learning approach, and the combination of two best-performing models yielded the highest prediction accuracy for categories including normal, COVID-19, non-COVID-19 pneumonia, and tuberculosis. The results suggest that their AI-based method outperforms previous models in terms of efficiency and accuracy, offering promising advancements for biomedical imaging in COVID-19 diagnostics.

The study in²⁴ proposed a multichannel deep learning approach for detecting lung diseases from chest X-ray images. The model uses EfficientNetB0, EfficientNetB1, and EfficientNetB2 pretrained models to extract features, which are then fused together and passed through non-linear fully connected layers. The fused features are further processed using a stacked ensemble learning classifier, which combines random forest, support vector machine (SVM), and logistic regression for lung disease detection. The method was tested on several lung diseases, including pneumonia, tuberculosis (TB), and COVID-19, achieving impressive performance with 98% detection accuracy for pediatric pneumonia, 99% for TB, and 98% for COVID-19. The proposed method outperformed similar techniques, demonstrating robust performance on unseen data and offering potential as a reliable tool for point-of-care diagnosis by radiologists. The feature optimization was also visualized using t-SNE for further validation of the model’s efficiency.

The author in²⁵ proposed a deep learning method using transfer learning to classify lung diseases from chest X-ray (CXR) images. Their method employs an end-to-end learning approach where raw CXR images are directly inputted into the EfficientNet v2-M model to extract meaningful features for disease classification. The study was tested on two datasets: the U.S. National Institutes of Health (NIH) dataset and the Cheonan Soonchunhyang University Hospital (SCH) dataset. For the NIH dataset, which included normal, pneumonia, and pneumothorax classes, the method achieved a validation accuracy of 82.15%, with 81.40% sensitivity and 91.65% specificity. For the SCH dataset, which added tuberculosis as a fourth class, the method achieved a validation accuracy of 82.20% and 94.48% specificity. The study demonstrates the potential of transfer learning to enhance the efficiency and accuracy of computer-aided diagnostic systems (CADs) for lung disease classification.

A multi-classification deep learning model named CDC-Net id introduced in²⁶ for detecting COVID-19, lung cancer (LC), pneumothorax, tuberculosis (TB), and pneumonia from chest X-ray images. The model incorporates residual networks and dilated convolution techniques, aiming to improve early diagnosis of these diseases, especially considering their similar symptoms that can mislead clinical professionals. The study used publicly available benchmark datasets to train and test the model, which outperformed several pre-trained CNN models, including VGG-19, ResNet-50, and Inception v3. The CDC-Net achieved an impressive accuracy of 99.39%, recall of 98.13%, and precision of 99.42%, with an AUC of 0.9953 for multi-disease classification. In comparison, the pre-trained models achieved lower accuracies: 95.61%, 96.15%, and 95.16%, respectively. Statistical tests (McNemar’s and ANOVA) confirmed the robustness of CDC-Net, suggesting its high potential for reliable and accurate chest disease diagnosis.

The study in²⁷ proposed a deep learning architecture for multi-class lung disease classification using chest X-ray (CXR) images. This model aimed to classify several lung conditions, including COVID-19, pneumonia, lung cancer, tuberculosis (TB), and lung opacity, which often share similar symptoms, making accurate and early diagnosis challenging. The dataset consisted of 3615 COVID-19 images, 6012 lung opacity, 5870 pneumonia, 20,000 lung cancer, 1400 TB, and 10,192 normal images. The model employed the pre-trained VGG19 architecture, followed by three blocks of convolutional neural networks (CNN) for feature extraction and a fully connected network for classification. The proposed approach achieved remarkable performance with 96.48% accuracy, 93.75% recall, 97.56% precision, 95.62% F1 score, and an AUC of 99.82%, outperforming existing methods. This high performance could significantly assist healthcare practitioners by enabling faster and more accurate diagnoses.

Table 1 Show the comparison comparison of lung disease classification models using chest X-ray images.

Full size table

Proposed methodology

Overview

The proposed framework was implemented as shown in Fig. 1, following a structured pipeline comprising data preparation, preprocessing, model architecture design, training configuration, and comprehensive evaluation. The methodology preserves the inherent multi-label nature of chest X-ray classification to maintain clinical relevance.

Dataset description and preparation

We utilized the NIH ChestX-ray dataset released by the NIH Clinical Center, comprising 112,120 frontal-view chest radiographs from 30,805 unique patients. We used the official 2017 release (including the accompanying metadata file Data_Entry_2017.csv) with labels mined from the associated radiology reports using natural language processing (NLP), as originally described in²⁹. The dataset was downloaded from the official NIH repository¹⁶. The dataset is publicly available and de-identified. We used the data under the dataset’s stated data-use and attribution terms and did not attempt to re-identify any individuals. For clinical relevance, we focused on five key thoracic pathologies: No Finding, Pneumonia, Pneumothorax, Effusion, and Cardiomegaly. The dataset was structured for multi-label classification where each image can have zero, one, or multiple positive labels simultaneously.

Class balancing strategy:

Medical datasets typically exhibit severe class imbalance. To address this, we implemented a per-class balancing approach with the following target sample counts based on clinical prevalence and learning requirements. For each class, we applied either oversampling with replacement (for underrepresented classes) or undersampling without replacement (for overrepresented classes). This approach maintained the dataset’s multi-label nature while ensuring balanced representation during training.

Preprocessing pipeline

Our preprocessing pipeline is designed to enhance radiographic features while maintaining clinical relevance:

1.
Grayscale Preservation: Images were maintained in grayscale format, consistent with diagnostic imaging standards.
2.
CLAHE Enhancement: We applied Contrast Limited Adaptive Histogram Equalization (CLAHE) with clinically optimized parameters (clipLimit = 2.0, tileGridSize = ( 8,8)). This locally adaptive contrast enhancement highlights subtle radiodensity variations crucial for pathology detection while avoiding over-amplification of noise.
3.
Standardized Resizing: All images were resized to 224 × 224 pixels to ensure compatibility with the EfficientNetB0 architecture while preserving aspect ratio through proper interpolation.
4.
Brightness Adjustment: A uniform brightness adjustment was applied to improve visibility of anatomical structures.
5.
Normalization: Pixel values were scaled to the [0, 255] range for compatibility with EfficientNet’s preprocessing requirements.

Data Augmentation:

To prevent overfitting and improve generalization, we applied the following real-time augmentations during training.

Horizontal flipping (50% probability).
Random rotation within ± 7 degrees.
Random brightness (± 12%) and contrast (± 15%) adjustments.
Small-scale zoom variations (± 10%).

The preprocessing pipeline is not dataset-specific and can be adapted to other medical imaging modalities with similar characteristics (grayscale images, class imbalance, multi-label annotations).

Model Architecture

The core of our framework is an enhanced EfficientNetB0 architecture optimized for multi-label chest X-ray classification:

Backbone:

EfficientNetB0 pretrained on ImageNet served as the feature extractor, providing a robust foundation for medical image analysis while maintaining computational efficiency.

Architectural Enhancements:

1.
Squeeze-Excitation Attention Block: We incorporated a custom Squeeze-Excitation (SE) block after the base model to enhance discriminative feature learning. This attention mechanism adaptively recalibrates channel-wise feature responses, improving sensitivity to subtle pathological features.
2.
Multi-Label Output Layer: The model employs five independent sigmoid activation units, each corresponding to one pathology class. This design enables true multi-label classification by allowing simultaneous prediction of multiple conditions with independent probability estimates.
3.
Regularization Components:
- Global Average Pooling to reduce parameter count while preserving spatial hierarchy.
- Batch Normalization layers to stabilize training and accelerate convergence.
- Dropout (0.4 rate) to prevent co-adaptation of neurons.
- L2 regularization to constrain model complexity.

Loss Function:

We employed Focal Loss (γ = 2.0, α = 0.25). This advanced loss function down-weights well-classified examples and focuses training on challenging, misclassified cases, which is particularly effective for handling class imbalance and label noise.

Training Strategy

Our training protocol employed a sophisticated two-phase transfer learning approach:

Phase 1 (Feature Extraction):

Epochs: 5 (or until convergence).
Learning Rate: 1e−4.
Configuration: EfficientNetB0 backbone frozen, only classification head trainable.
Objective: Learn dataset-specific feature representations while leveraging ImageNet knowledge.

Phase 2 (Fine-Tuning):

Epochs: Up to 20 (total 25 epochs maximum).
Learning Rate: 5e−5 (reduced for stable fine-tuning).
Configuration: Entire model unfrozen for end-to-end optimization.
Objective: Refine both feature extraction and classification for the specific thoracic pathology domain.

Training Optimization:

Optimizer: Adam with adaptive learning rate scheduling.
Batch Size: 16 (balanced between computational efficiency and gradient stability).
Early Stopping: Monitored validation loss with patience of 5–6 epochs.
Model Checkpointing: Saved best-performing weights based on validation metrics.
Learning Rate Reduction: Activated when validation loss plateaued (factor = 0.3, patience = 2–3).

Cross-Validation:

We employed rigorous threefold cross-validation to ensure robust performance estimation and minimize bias from data partitioning.

Evaluation framework

Comprehensive evaluation was conducted using multiple metrics²⁸ appropriate for multi-label classification:

AUC (Area Under ROC Curve): Primary measure of discriminative ability.
Precision, Recall, F1-Score: Balanced assessment of classification performance.
Confusion Matrices: Visualization of classification patterns and error types.
Standard Deviation: Measure of performance stability across folds.

The evaluation framework ensures clinically relevant assessment, with particular emphasis on sensitivity (recall) for critical conditions like pneumothorax and pneumonia, where false negatives have significant clinical consequences.

Implementation details

The complete pipeline was implemented in Python using TensorFlow/Keras, with all preprocessing, augmentation, training, and evaluation steps integrated into a reproducible workflow. The code is structured to facilitate adaptation to other medical imaging tasks and datasets, with comprehensive configuration options for key hyperparameters.

Results and discussion

The threefold cross-validation results for DenseNet121 are presented in Table 2. The model achieved a consistent average validation accuracy of 89% across all folds, with minimal variability as indicated by low standard deviations in performance metrics (e.g., F1-score standard deviations ranged from 0.0046 to 0.0175). Class-wise analysis revealed that Cardiomegaly achieved the highest average AUC (0.94), while Pneumothorax exhibited the lowest average F1-score (0.64). The stability across folds suggests DenseNet121 provides reliable but moderate performance for thoracic pathology classification.

Table 2 Show the results of DensNet121.

Full size table

Table 3 summarizes MobileNetV2’s performance through threefold cross-validation, yielding an average validation accuracy of 88%. The model demonstrated particularly strong performance for Pneumonia, achieving the highest average F1-score (0.83) and AUC (0.95) among all classes. However, it showed relative weakness in classifying Pneumothorax, with the lowest average F1-score (0.61) and precision (0.50). The moderate standard deviations across folds (e.g., F1-score SDs: 0.0055–0.0193) indicate reasonable consistency, though slightly higher variability compared to DenseNet121.

Table 3 Show the results of MobileNetV2.

Full size table

Table 4 show EfficientNetB0’s cross-validation results, with an average validation accuracy of 89%. The model achieved the highest average AUC values across multiple classes, particularly for Cardiomegaly (0.95) and Pneumonia (0.94). While Pneumothorax classification remained challenging (average F1-score: 0.63), EfficientNetB0 showed improved recall for this class (0.81) compared to other architectures. The model exhibited moderate variability across folds, with standard deviations for F1-scores ranging from 0.0041 to 0.0254.

Table 4 Show the results of EfficientNetB0.

Full size table

Table 5 provides a comprehensive comparison of the three architectures’ performance metrics averaged across threefold cross-validation. DenseNet121 and EfficientNetB0 both achieved the highest average validation accuracy (89.00%), while MobileNetV2 showed slightly lower performance (88.00%). MobileNetV2 exhibited the highest macro-average precision (0.654) and F1-score (0.720), indicating strong classification consistency. However, EfficientNetB0 demonstrated superior sensitivity with the highest macro-average recall (0.824) and discriminative ability with the highest macro-average AUC (0.906), suggesting better identification of positive cases across all pathologies. DenseNet121 showed the most stable performance with the lowest average standard deviation in F1-scores (0.0115), indicating minimal variability across folds. This comparative analysis reveals that while MobileNetV2 offers strong precision-based performance, EfficientNetB0 provides a better balance between sensitivity (recall) and discriminative power (AUC), making it particularly suitable for clinical applications where false negatives carry significant consequences.

Table 5 Comparative Analysis of model performance across threefold cross-validation.

Full size table

The comprehensive analysis reveals EfficientNetB0 as the optimal architecture for thoracic pathology classification based on three key factors: (1) It achieved the highest discriminative ability with a macro-average AUC of 0.906, indicating superior separation between pathological and normal cases; (2) It demonstrated the best sensitivity with a macro-average recall of 0.824, crucial for minimizing false negatives in critical conditions like pneumonia and pneumothorax; (3) While MobileNetV2 showed marginally better precision (0.654 vs. 0.640) and F1-score (0.720 vs. 0.716), EfficientNetB0’s superior AUC and recall metrics better align with clinical priorities in thoracic screening.

Among the threefolds of EfficientNetB0, Fold 1 emerges as the most representative configuration for generating final evaluation figures based on the following quantitative analysis: Fold 1 achieved the highest validation accuracy (89%) among all EfficientNetB0 folds, matching the model’s overall average performance. More importantly, Fold 1 showed exceptional performance for critical conditions - it attained the highest F1-score for Pneumonia (0.79) among all EfficientNetB0 folds and demonstrated strong performance for Cardiomegaly (F1: 0.75, AUC: 0.95). Although Fold 1 showed lower precision for Pneumothorax (0.48), it maintained a clinically favorable high recall (0.83) for this critical condition, minimizing the risk of missed diagnoses. The consistent performance across multiple metrics without significant outliers makes Fold 1 an ideal candidate for comprehensive visualization and analysis.

We recommend presenting Fold 1 of EfficientNetB0 as the primary configuration, with accompanying figures (Loss evolution, Accuracy evolution, Multi-label confusion matrix, ROC curves, and prediction examples) generated from this fold. This selection ensures that readers observe performance characteristics that are both representative of the model’s overall capability and optimized for critical clinical applications.

Figure 2 shows the loss evolution plot for EfficientNetB0 (Fold 1) that illustrates the training dynamics over 20 epochs, highlighting both learning progress and emerging overfitting. Training loss (blue) shows a consistent decline from approximately 0.04 at initialization to around 0.02 by epoch 20, indicating effective gradient-based optimization and feature assimilation. In contrast, validation loss (orange) initially decreases in tandem, reaching its minimum of approximately 0.03 at epoch 3. However, beyond this point, validation loss begins to plateau and slightly increase, settling around 0.03–0.035 by epoch 20, while training loss continues to decrease. This subtle but definitive divergence from a near-zero gap at epoch 3 to a gap of approximately 0.01–0.015 by epoch 20 signals the onset of overfitting, where the model becomes increasingly specialized to the training data at the expense of generalization. The optimal stopping point for this fold occurs at epoch 3, where validation loss is minimized; balancing model learning with generalization capability before overfitting begins to degrade performance.

Figure 3 shows the accuracy evolution plot that illustrates the learning progress and generalization behavior of the EfficientNetB0 model over 20 epochs. Training accuracy (train_acc) demonstrates robust learning, rising from approximately 0.90 at initialization and approaching near-perfect convergence (~ 0.99) by epoch 20. Validation accuracy (val_acc) follows a distinct trajectory: it increases rapidly from ~ 0.70 to ~ 0.90 within the first 3–4 epochs, after which it plateaus with only minor oscillations, stabilizing around 0.90 for the remainder of training. These results in a progressively widening gap between training and validation accuracy reaching roughly 0.09 by epoch 20 which corroborates the overfitting trend observed in the loss evolution. The early stabilization of validation accuracy, combined with the absence of late-stage decline, indicates that the model achieves its maximal generalizable performance early in training, further justifying the early stopping criterion identified at epoch 3.

Figure 4 shows the receiver operating characteristic (ROC) curves that demonstrate the model’s discriminative ability across the five thoracic pathology classes. All classes achieve AUC values above 0.85, confirming strong diagnostic performance. Pneumonia exhibits the highest AUC (0.950), followed closely by Cardiomegaly (0.946), indicating excellent separation of these critical conditions from other classes. Pneumothorax (AUC = 0.898) and Effusion (AUC = 0.882) show robust but slightly lower performance, likely reflecting the subtle and overlapping radiographic presentations of pleural pathologies. The No Finding class (AUC = 0.874), while still demonstrating good discriminative capacity, presents the greatest diagnostic challenge, consistent with the clinical difficulty of confidently classifying normal studies in the presence of potential subtle abnormalities. The overall high AUC values across all categories validate the model’s utility as a screening aid, particularly for high-stakes conditions such as pneumonia and cardiomegaly.

Figure 5 shows the confusion matrices that reveal distinct diagnostic patterns and challenges across the thoracic pathology classes. For the critical No Finding category, the model demonstrates strong specificity but shows notable false positives, with a portion of normal cases being misclassified as pathological a cautious approach that may enhance sensitivity for disease detection but could increase unnecessary follow-up in clinical settings. Conversely, the Cardiomegaly confusion matrix indicates robust true positive identification but reveals specific misclassification patterns, particularly with conditions exhibiting overlapping cardiac silhouette characteristics. These matrices collectively highlight the model’s strengths in detecting clear pathologies while exposing areas for refinement in distinguishing subtle normal variants from early disease states and resolving inter-class confusion among overlapping thoracic abnormalities.

Figure 6 shows the visualizations with correct and challenging predictions with corresponding saliency maps, providing interpretability into the model’s decision-making process. The saliency maps demonstrate that the model appropriately focuses on clinically relevant regions highlighting pulmonary consolidation areas in correctly classified pneumonia cases, pleural margins in pneumothorax predictions, and cardiac silhouette contours in cardiomegaly assessments. However, the examples also reveal critical error patterns, particularly in cases of effusion-pneumothorax confusion where the model attends to overlapping pleural regions, and in the misclassification of normal studies as pathological where it may be responding to benign anatomical variants or imaging artifacts. These visual explanations not only validate the model’s alignment with radiological reasoning but also pinpoint specific diagnostic ambiguities that mirror real-world clinical challenges, offering transparent insights for both model refinement and potential clinical deployment.

Conclusion

This study established an optimized multi-label EfficientNetB0 framework for thoracic pathology classification that preserves the clinical reality of co-occurring conditions in chest X-ray analysis. The proposed approach integrates a systematic preprocessing pipeline (CLAHE enhancement with radiological parameter optimization), a per-class balancing strategy that retains multi-label samples, and an enhanced architecture incorporating a Squeeze-Excitation attention block and Focal Loss to address class imbalance and label noise. Through rigorous three-fold cross-validation, the framework demonstrated robust diagnostic performance, achieving a macro-average AUC of 0.906 and superior recall (0.824), with particularly strong discrimination for Pneumonia (AUC = 0.950) and Cardiomegaly (AUC = 0.946). The two-phase transfer-learning strategy (feature extraction followed by fine-tuning) proved effective in adapting EfficientNetB0 to the thoracic imaging domain while maintaining generalization. The results confirm that the framework successfully balances learning capacity and clinical applicability in a realistic multi-label setting a critical advancement over conventional single-label simplifications. While the pipeline has been validated on the NIH dataset, its modular design (preprocessing, balancing, and architecture components) is readily adaptable to other chest X-ray collections and imaging modalities.

Limitations and future directions

While this study demonstrates an effective multi-label classification framework on the NIH Chest X-ray dataset, several limitations should be acknowledged. First, although we preserved the multi-label nature of the dataset allowing the model to learn from co-occurring pathologies the NIH labels are derived from automated NLP extraction of radiology reports and may contain residual label noise. Rather than discarding potentially valuable multi-label samples, we addressed noise indirectly through robust training techniques (focal loss, augmentation, regularization); however, dedicated noisy-label learning strategies such as co-teaching, label correction, or curriculum learning could further improve robustness and will be explored in future work. Second, validation on a single publicly available dataset limits the assessment of generalizability across different patient populations, imaging protocols, and institutional settings. Third, while the model shows strong performance on the five selected pathologies, extending it to a larger set of thoracic findings (including rare or subtle conditions) would increase clinical utility. Future work will therefore focus on: (1) implementing advanced noisy-label learning techniques to handle label uncertainty more explicitly; (2) multi-dataset and multi-center validation using external CXR collections (e.g., CheXpert, MIMIC-CXR); (3) expanding the classification to include additional pathologies and integrating clinical metadata to enhance context-awareness; and (4) prospective clinical testing to evaluate real-world diagnostic impact and workflow integration.

Data availability

The NIH ChestX-ray dataset used in this study is publicly available and de-identified. We accessed the dataset from the official NIH Clinical Center repository (https://nihcc.app.box.com/v/ChestXray-NIHCC; accessed 2026 Feb 24) and used it under its stated data-use and attribution terms, including citation of the original dataset publication²⁹. For transparency, we note that a Kaggle mirror exists (https://www.kaggle.com/datasets/nih-chest-xrays/data). The code used for data preprocessing, model training, and evaluation is publicly available via Zenodo at https://doi.org/10.5281/zenodo.18762869.

References

RadiologyInfo.org. X-ray (radiography)—chest. Accessed (2025). https://www.radiologyinfo.org/en/info.cfm?pg=chestrad. Jun 9.
Qin, C., Yao, D., Shi, Y. & Song, Z. Computer-aided detection in chest radiography based on artificial intelligence: A survey. Biomed. Eng. Online 17, 1–23 (2018).
Article Google Scholar
Lodwick, G. S., Keats, T. E. & Dorst, J. P. The coding of roentgen images for computer analysis as applied to lung cancer. Radiology 81, 185–200 (1963).
Article CAS PubMed Google Scholar
Zakirov, A. N., Kuleev, R. F., Timoshenko, A. S. & Vladimirov, A. V. Advanced approaches to computer-aided detection of thoracic diseases on chest X-rays. Appl. Math. Sci. 9(88), 4361–9 (2015). https://doi.org/10.12988/ams.2015.54348
van Ginneken, B., Hogeweg, L. & Prokop, M. Computer-aided diagnosis in chest radiography: Beyond nodules. Eur. J. Radiol. 72 (2), 226–230 (2009).
Article PubMed Google Scholar
Zalewa, K. et al. Application of artificial intelligence in radiological image analysis for pulmonary disease diagnosis: A review of current methods and challenges. J. Educ. Health Sport 77, 56893 (2025).
Article Google Scholar
Nam, J. G. et al. Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 290, 218–228 (2019).
Article PubMed Google Scholar
Hwang, E. J. et al. Development and validation of a deep learning-based automatic detection algorithm for active pulmonary tuberculosis on chest radiographs. Clin. Infect. Dis. 69, 739–747 (2019).
Article PubMed PubMed Central ADS Google Scholar
Hwang, E. J. et al. Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw. Open. 2, e191095 (2019).
Article PubMed PubMed Central Google Scholar
Park, S. H. & Han, K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 286, 800–809 (2018).
Article PubMed Google Scholar
Park, S. H. Diagnostic case-control versus diagnostic cohort studies for clinical validation of artificial intelligence algorithm performance. Radiology 290, 272–273 (2019).
Article PubMed Google Scholar
Cheikh, A. B. et al. How artificial intelligence improves radiological interpretation in suspected pulmonary embolism. Eur. Radiol. 32(9), 5831–5842 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yoo, H., Kim, K. H., Singh, R., Digumarthy, S. R. & Kalra, M. K. Validation of a deep learning algorithm for the detection of malignant pulmonary nodules in chest radiographs. JAMA Netw. Open. 3(9), e2017135 (2020).
Article PubMed PubMed Central Google Scholar
Ather, S., Kadir, T. & Gleeson, F. Artificial intelligence and radiomics in pulmonary nodule management: Current status and future applications. Clin. Radiol. 75(1), 13–19 (2020).
Article CAS PubMed Google Scholar
Tandon, Y. K., Bartholmai, B. J. & Koo, C. W. Putting artificial intelligence (AI) on the spot: Machine learning evaluation of pulmonary nodules. J. Thorac. Dis. 12(11), 6954 (2020).
Article PubMed PubMed Central Google Scholar
NIH Clinical Center. ChestXray-NIHCC dataset. https://nihcc.app.box.com/v/ChestXray-NIHCC. Accessed 2026 Feb 24.
Sanida, M. V., Sanida, T., Sideris, A. & Dasygenis, M. An advanced deep learning framework for multi-class diagnosis from chest X-ray images. J. 7(1), 48–71 (2024).
Google Scholar
Sunnetci, K. M. & Alkan, A. Biphasic majority voting-based comparative COVID-19 diagnosis using chest X-ray images. Expert Syst. Appl. 216, 119430. https://doi.org/10.1016/j.eswa.2022.119430 (2023).
Article PubMed Google Scholar
Shamrat, F. M. J. M. et al. High-precision multiclass classification of lung disease through customized MobileNetV2 from chest X-ray images. Comput. Biol. Med. 155, 106646. https://doi.org/10.1016/j.compbiomed.2023.106646 (2023).
Article PubMed Google Scholar
Elhanashi, A., Saponara, S. & Zheng, Q. Classification and localization of multi-type abnormalities on chest X-ray images. IEEE Access. 11, 112345–112357. https://doi.org/10.1109/ACCESS.2023.3302180 (2023).
Article Google Scholar
Holste, G. et al. Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge. Med. Image Anal. 97, 103224. https://doi.org/10.1016/j.media.2024.103224 (2024).
Article PubMed PubMed Central Google Scholar
Jeong, J., Jeoun, B., Park, Y. & Han, B. An optimized ensemble framework for multi-label classification on long-tailed chest X-ray data. KT Res. Dev. Cent. KT Corp (2023).
Sharma, A., Rani, S. & Gupta, D. Artificial intelligence-based classification of chest X-ray images into COVID-19 and other infectious diseases. Int. J. Biomed. Imaging 2020(1), 8889023. https://doi.org/10.1155/2020/8889023 (2020).
Article PubMed PubMed Central Google Scholar
Ravi, V., Acharya, V. & Alazab, M. A multichannel EfficientNet deep learning-based stacking ensemble approach for lung disease detection using chest X-ray images. Cluster Comput. 26, 1181–1203. https://doi.org/10.1007/s10586-022-03664-6 (2023).
Article PubMed Google Scholar
Kim, S. et al. Deep learning in multi-class lung diseases’ classification on chest X-ray images. Diagnostics (Basel) 12(4), 915. https://doi.org/10.3390/diagnostics12040915 (2022).
Article CAS PubMed PubMed Central Google Scholar
Malik, H. et al. CDC_Net: Multi-classification convolutional neural network model for detection of COVID-19, pneumothorax, pneumonia, lung cancer, and tuberculosis using chest X-rays. Multimed. Tools Appl. 82, 13855–13880. https://doi.org/10.1007/s11042-022-13843-7 (2023).
Article PubMed Google Scholar
Alshmrani, G. M. M., Ni, Q., Jiang, R., Pervaiz, H. & Elshennawy, N. M. A deep learning architecture for multi-class lung diseases classification using chest X-ray (CXR) images. Alexand. Eng. J. 64, 923–935. https://doi.org/10.1016/j.aej.2022.10.053 (2023).
Tawfik, M., Fathi, I. S., Nimbhore, S. S., Alsmadi, I. M. & Sawah, M. S. E-RespiNet: An LLM-ELECTRA driven triple-stream CNN with feature fusion for asthma classification. PLoS One 20(11), e0334528. https://doi.org/10.1371/journal.pone.0334528 (2025).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). 2097–2106. https://doi.org/10.1109/CVPR.2017.369 (2017).

Download references

Acknowledgements

Not applicable.

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and Affiliations

Information Systems Department, Faculty of Information Systems and Computer Science, October 6 University, Giza, 12585, Egypt
Nagwa Yaseen Hegazy
Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Ajloun National University, P.O.43, Ajloun, 26810, Jordan
Mohamed S. Sawah
Department of Information Systems, Al-Alson Higher Institute, Cairo, Egypt
Mohamed S. Sawah

Authors

Nagwa Yaseen Hegazy
View author publications
Search author on:PubMed Google Scholar
Mohamed S. Sawah
View author publications
Search author on:PubMed Google Scholar

Contributions

(1)We introduce a systematic, reproducible, and clinically-informed preprocessing pipeline specifically designed for chest X-ray images. This pipeline integrates grayscale preservation, optimized CLAHE-based contrast enhancement (clipLimit=2.0, tileGridSize=8 × 8), brightness adjustment, and standardized resizing. The parameters are selected based on radiological best practices to enhance subtle thoracic structures and standardize inputs for deep learning, while remaining adaptable to other medical imaging modalities.(2) We propose a rigorous data balancing strategy that preserves the NIH dataset’s inherent multi-label complexity, addressing severe class imbalance without discarding clinically relevant multi-label samples. Our per-class strategy employs strategic oversampling and undersampling, ensuring robust learning in a setting that reflects the real-world co-occurrence of thoracic pathologies.(3) We design and validate an enhanced EfficientNetB0 framework optimized for multi-label thoracic classification, integrating a custom Squeeze-Excitation attention block for improved feature recalibration and employing Focal Loss to handle class imbalance and label noise. The framework implements a two-phase transfer learning strategy (feature extraction followed by full fine-tuning) evaluated through comprehensive metrics, providing empirical guidance on optimal training protocols for multi-label medical image analysis.

Corresponding author

Correspondence to Nagwa Yaseen Hegazy.

Ethics declarations

Competing interests

The authors declare no competing interests.

Consent for publication

Not applicable.

Ethics approval

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hegazy, N.Y., Sawah, M.S. An optimized EfficientNetB0 framework with CLAHE-based preprocessing for accurate multi-class chest X-ray classification. Sci Rep 16, 10811 (2026). https://doi.org/10.1038/s41598-026-42492-1

Download citation

Received: 20 September 2025
Accepted: 26 February 2026
Published: 28 March 2026
Version of record: 31 March 2026
DOI: https://doi.org/10.1038/s41598-026-42492-1

Subjects

Abstract

Similar content being viewed by others

Deep learning classification of active tuberculosis lung zones wise manifestations using chest X-rays: a multi label approach

A 178-clinical-center experiment of integrating AI solutions for lung pathology diagnosis

CXR-MultiTaskNet a unified deep learning framework for joint disease localization and classification in chest radiographs

Introduction

Related work

Proposed methodology

Overview

Dataset description and preparation

Preprocessing pipeline

Data Augmentation:

Model Architecture

Backbone:

Loss Function:

Training Strategy

Cross-Validation:

Evaluation framework

Implementation details

Results and discussion

Conclusion

Limitations and future directions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Consent for publication

Ethics approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links