Introduction

Premature delivery poses a significant global health concern. Infants weighing less than 1500 g at birth constitute over 3.2% of births in China, with this proportion showing recent increases1. Lung dysplasia disorders, such as bronchopulmonary dysplasia and neonatal respiratory distress syndrome (NRDS), are prevalent among premature infants. NRDS, a leading critical condition among newborns globally, presents a substantial threat to infant survival. It is defined as a deficiency in pulmonary surfactant (PS), resulting in the rapid onset of dyspnea, cyanosis, and respiratory failure within hours of birth2. It is commonly known as hyaline membrane disease (HMD) due to the presence of hyaline membranes in the lungs3,4. Compared to Europe and America, China exhibits a lower NRDS incidence, approximately 7%. Bedside X-ray radiography has traditionally been considered the preferred method for diagnosing NRDS, given the high radiation dose associated with computed tomography (CT) scans and the impracticality of conducting bedside investigations.

Radiomics is a rapidly developing field in clinical research. In recent years, radiomics has increasingly been used in the diagnosis and prognosis based on medical imaging, enabling the extraction and analysis of quantitative features from medical images5,6. Currently, many radiomics studies have been able to help radiologists delve deeper into the hidden information within images, which can assist in the clinical diagnosis, treatment, and prediction of prognosis7,8,9,10.

Some researchers have proposed using radiomic features combined with contrastive learning to detect pneumonia in CXR images, demonstrating good performance and interpretability on the RSNA pneumonia detection challenge dataset7. By analyzing patients’ CXR images, some scholars have found that radiomic features have high accuracy in identifying COVID-19, which has positive implications for supporting clinical decision-making for physicians8,11,12. Researchers have developed a scoring system using radiomics, which analyzes and calculates the radiomic score (RadScore) of patients’ CXR images, the clinical score (TBScore), and the radiological score (RLscore). By examining the correlation between the changes in these three scores, the system aims to enable quantitative monitoring of the treatment response to tuberculosis (TB)9.

However, owing to the two-dimensional (2D) nature of X-rays, chest radiographs often exhibit overlapping images of ribs, soft tissues (such as bronchi, pulmonary arteries, pulmonary veins, etc.), and artifacts. These factors may impede the radiologist’s accuracy in lesion observation and disease diagnosis10,13,14. Deep learning techniques have demonstrated promising advancements in medical image processing. Convolutional neural network (CNN) models, such as MPR-CNN, DudeNet, and CCNN, have been employed in medical image preprocessing to improve imaging quality15,16,17,18.

Based on this background, the objective of this study is to develop and validate machine learning models based on radiomic features to distinguish between CXR images of NRDS patients and non-NRDS patients. We removed the rib images from the CXR images using a deep learning approach and used them as input for the machine learning model, comparing the performance of the model before and after rib suppression.

Methods

This study was reviewed and approved by the Institutional Review Board of Shandong Provincial Maternal and Child Health Care Hospital affiliated to Qingdao University (No.2024-001). The methods conducted in this study adhered to relevant guidelines and regulations. Informed consent was obtained from all subjects and/or their legal guardians after the study’s purpose, procedures, risks, and benefits were thoroughly explained.

Study design

Our study introduces six machine learning models to develop radiomics models that encompass radiomic features extracted from neonatal chest X-rays both before and after rib suppression. We conducted a thorough evaluation and validation of each model. The study workflow is illustrated in Fig. 1.

Fig. 1
figure 1

Workflow of necessary steps in the current study. LASSO: Least Absolute Shrinkage and Selection Operator, ROC: Receiver Operating Characteristic.

Patients

A total of 138 inpatients (including 101 neonates diagnosed with NRDS) with CXR imaging data obtained within hours after birth at the Institutional Review Board of Shandong Provincial Maternal and Child Health Care Hospital affiliated to Qingdao University between January 2022 and March 2023 were included in this study. The patients were divided into training (n = 96) and validation sets (n = 42) based on admission time. The details of the data are shown in Table 1.

Participants meeting the following criteria were extracted from patient’s medical records: (a) gestational age ≤ 37 weeks; (b) manifestation of progressive irregular respiration, dyspnea, cyanosis, and respiratory failure within a few hours of birth; (c) evidence of reduced lung transparency on X-ray, characterized by uniformly distributed fine granular shadows, with severe cases presenting a “white lung” phenomenon. All neonates meeting these criteria underwent CXR imaging within hours of birth. Participants were excluded if they had: (a) severe congenital malformations; (b) genetic metabolic diseases or chromosomal disorders; (c) liver, kidney, or thyroid diseases in either the preterm infant or their mother; (d) discontinued treatment or incomplete case data. Refer to Fig. 2 for the flowchart depicting the study population selection process.

Table 1 Training and validation data description.
Fig. 2
figure 2

The flowchart of the selection process of the study population. NICU: neonatal intensive care unit, NRDS: neonatal respiratory distress syndrome.

Chest x-ray imaging acquisition

Bedside X-ray radiography is a standard procedure in the NICU, particularly when NRDS is suspected. The data utilized in this study were acquired using the uDR 370i (United Imaging Healthcare Co., Ltd, Shanghai, China) digital radiography system. Imaging was conducted in the supine anteroposterior position, with a fixed source to image receiver distance (SID) of 100 cm. Neonatal exposure parameters included a tube voltage of 65 kV, an exposure of 2.8mAs, and the use of a lead apron for radiation protection. Following parameter selection, adjust the grid size and initiate exposure. Post-exposure, verify image correctness and standardization; if not met, repeat exposure.

Bone suppression

The release of the image database with and without lung cancer nodules proposed by the Japanese Society of Radiological Technology (JSRT) opened this way for many research groups around the world19. Chest Diagnostic System Research Group (Budapest, Hungary) provided the bone shadow eliminated (BSE) version of the JSRT database (BSE-JSRT)20. BSE-JSRT database contains 247 images of the JSRT dataset, where shadows of ribs and clavicles were thoroughly removed from CXR images of the JSRT database by the special algorithms. Our analysis involved using CXR images and BSIs from this dataset, along with partially enhanced imaging data, for training and validating the rib suppression model. Then, we applied rib suppression to the neonatal chest X-rays collected in our study using models trained on public datasets. As for the self-built data utilized in our study, since it did not include BSIs, we assessed the rectal inhibition effect of this portion of the data through evaluation by two senior radiologists.

Initially, we employed publicly available code to construct the U-Net for our research21. This architecture, known for its “encoder-decoder” design, facilitates the extraction of contextual information from input data, enhancing output accuracy. Subsequently, we modified the model’s structure accordingly. The revised model incorporates a self-attention22 to learn spatial and feature information, diverging from the traditional U-Net framework reliant on cross-entropy loss. Detailed descriptions of the model, including its architecture and parameters, have been provided in the supplementary materials.

The dataset was partitioned into a training set and a validation set at a 7:3 ratio. Metrics such as cross-entropy loss, structural similarity index measure (SSIM) and mean absolute error (MAE) were employed to evaluate the efficacy of the rib suppression model and pinpoint disparities between anticipated and observed outcomes. Preprocessing was conducted on the experimental image data, resizing it to 512 pixels by 512 pixels. The CNN model described above was implemented using the TensorFlow 2.3 framework and programmed in Python 3.6. All deep learning computations were executed on a system featuring an NVIDIA Quadro P2000 (with 5GB of graphics memory) and an Intel(R) Xeon(R) CPU E5-1620 v4.

Segmentation methods

Following radiological grading and diagnostic standards for NRDS, the region of interest (ROI) in this study was delineated as the area encompassing the thoracic cavity, including lung parenchyma, cardiac structures, and diaphragmatic borders. Manual delineation was employed to segment the ROIs.

Images underwent resampling and normalization to mitigate variations stemming from equipment and scanning disparities. In the ROI delineation process, two junior radiologists utilized the RadCloud platform (Version 7.2; Huiying Medical Technology Co., Ltd., Beijing) to outline preterm neonatal CXR images23. Adhering to radiological diagnostic criteria for NRDS, they defined lung fields with smooth closed curves, encompassing cardiac and diaphragmatic boundaries. Subsequent to initial delineation, a senior radiologist reviewed the completed ROI, requesting revisions for non-compliant tracings and implementing necessary adjustments. Thirty days later, 30 images were randomly selected for ROI redefinition by the same two radiologists. Senior radiologists validated all ROI segmentations and resolved any discrepancies through consensus.

Radiomics features extraction and selection

This study employed the RadCloud platform for feature extraction and screening. PyRadiomics (version 3.1.0, https://pyradiomics.readthedocs.io/), based on Python (version 3.7.0, https://www.python.org), was utilized to extract radiomics features from medical images24. PyRadiomics standardized CXR images, involving resampling and gray-level discretization with a bin width of 25, while conforming to the Image Biomarker Standardization Initiative (IBSI, https://theibsi.github.io/)25. In our study, we extracted a total of 1316 image features from the ROI of each patient. The intraclass correlation coefficient (ICC) was used to evaluate intra- and inter-observer reproducibility. Radiomics features with good intra- and inter-observer reproducibility (ICC > 0.75) were selected for comparison between the two groups.

To enhance model reliability and reduce feature dimensionality, we initially applied the variance threshold method, setting the threshold at 0.8 to exclude features with variances less than 0.8. Next, SelectKBest univariate analysis was used to select radiomics characteristics with a significance level of P-value < 0.05 for further investigation. The best radiomics features were determined using LASSO regression on the filtered features. The LASSO approach was employed to create a penalty function that compressed the selected regression coefficients. L1 regularization served as the loss function for LASSO, with a 5-fold cross-validation error and a maximum iteration count of 1000. Finally, the most effective radiomics features were chosen for modeling26,27,28.

Development and validation of models

In this study, we utilized six machine learning models to develop and validate predictive models: Gradient Boosting Machine (GBM), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Multilayer Perceptron (MLP) Classifier, Random Forest (RF), and Extreme Gradient Boosting (XGBoost). To achieve discrimination of NRDS in neonatal CXR, these six models were trained on radiomics features extracted from images both before and after rib suppression. A range of validation techniques were employed to evaluate the performance of these models and identify the most effective one. Specifically, we compared the performance of each machine learning model across three groups: A (After rib suppression), B (Before rib suppression), and C (Combined A and B). ML analysis was performed using the Python software Scikit-learn (version 1.0.2, https://scikit-learn.org). The discriminative performance of the developed radiomics models was assessed using a comprehensive set of evaluation metrics, including receiver operating characteristic (ROC) curves, area under the curve (AUC), sensitivity, specificity, precision, and F1-Score29. These metrics collectively provided a thorough evaluation of the models’ ability to accurately distinguish NRDS cases from controls.

Result

Bone suppression

The results of rib suppression on adult CXR images by the model are depicted in Fig. 3. Throughout the training and validation process, we evaluated the network’s performance using metrics such as cross-entropy loss, MAE, and SSIM. Detailed results are provided in Table S1 in the Supplementary Material.

Fig. 3
figure 3

The CXR image is shown and the corresponding BSI is obtained by our rib suppression model. (a) adult CXR image, (b) prediction of the model, (c) ground truth.

We aimed to apply a rib suppression model, developed using adult CXR images, to neonatal CXR images. Due to the absence of matching BSIs in neonatal CXRs, the rib suppression effect was assessed by two senior radiologists and two junior radiologists. They unanimously agreed that the BSIs generated by the model performed well and were beneficial for image interpretation and diagnosis. Figure 4 shows a comparison of neonatal CXR images before and after rib suppression.

Fig. 4
figure 4

(a) CXR image of a patient with NRDS; (b) BSI of a patient with NRDS; (c) CXR images of normal neonate; (d) BSI of the normal neonate.

Radiomics features extraction and selection

Leveraging PyRadiomics technology, 1,316 radiomics features were extracted from the CXR images, 1,036 of 1,316 radiomics features remained following ICC analysis (ICC > 0.75) within the Region of Interest (ROI). From these 1,036 features, 265 were selected using a variance threshold approach, with a threshold set at 0.8. The SelectKBest method was subsequently employed to further reduce this to 57 features. Finally, the LASSO technique was employed to identify the optimal set of 21 features for the model. The statistics from the LASSO analysis are illustrated in Fig. 5. Table S2 in the supplementary material provides a detailed breakdown of the number of features and their respective groupings at each stage of the feature extraction and selection process, offering an overview of how features were categorized and the final count of features selected for use in the model.

Fig. 5
figure 5

LASSO authorism on feature selection. (a) LASSO path; (b) MSE path; (c) coefficients in LASSO model. LASSO: least absolute shrinkage and selection operator.

Results of machine learning models

After applying rib suppression, the performance of six machine learning models significantly improved, as quantified by the AUC metric. The AUC scores for GBM, LDA, LR, MLP, RF, and XGBoost were 0.781, 0.725, 0.738, 0.744, 0.759, and 0.769 respectively. In contrast, before rib suppression, the corresponding AUCs were notably lower at 0.556, 0.625, 0.625, 0.638, 0.647 and 0.544.

When considering combined features from both before and after rib suppression datasets, the AUCs for GBM, LDA, LR, MLP, RF, and XGBoost were 0.731, 0.762, 0.756, 0.762, 0.544, and 0.750, indicating that the integration of these features further enhanced model performance in some cases.

Additionally, Table 2 presents a comprehensive overview of the machine learning model’s AUCs across different settings. For the training set, all models achieved high performance metrics after rib suppression, with particularly outstanding results for RF and XGBoost, achieving near-perfect scores of 0.999 and 1.000, respectively, on index A. On the validation set, the models also showed substantial improvements after rib suppression, with GBM reaching an AUC of 0.781 compared to 0.556 before suppression. Notably, LDA and LR performed especially well when combining features from both scenarios, with AUCs of 0.762 and 0.756, respectively.

The ROC curves for validation sets of each model are illustrated in Fig. 6, providing visual confirmation of the models’ discriminatory power. Figure 7 presents radar maps that we created based on various performance metrics, including sensitivity, specificity, precision, and F1-Score, for each of the models. The specific values for these metrics are detailed in Table S3 of the supplementary materials.

Table 2 AUCs of machine learning models.
Fig. 6
figure 6

ROC curve of different models in validation sets. a-f: GBM, LDA, LR, MLP Classifier, XGBoost, RF. Model A: After bone suppression Model B: Before bone suppression Model C: Combination of A and B.

Fig. 7
figure 7

Radar map of different models in validation sets. a-f: GBM, LDA, LR, MLP Classifier, XGBoost, RF. Model A: After bone suppression Model B: Before bone suppression Model C: Combination of A and B.

Discussion

Neonatal respiratory distress syndrome, a prevalent pulmonary condition in premature infants, stands as a leading cause of respiratory insufficiency and neonatal mortality in NICUs30,31,32. Its incidence rises with decreasing childbearing age, reaching nearly 93% in very early births (< 28 weeks). Although the rates decrease in late, premature, and post-menstrual births, they remain elevated, ranging from 0.3 to 11%. NRDS gives rise to various complications, including persistent pulmonary arterial hypertension, dysphoria, pneumonia, and, in severe cases, neonatal demise. Precise early identification of NRDS and the development of robust artificial intelligence (AI) diagnostic models based on bedside X-ray radiography hold significant clinical value for managing NRDS complications and predicting adverse outcomes.

X-ray, a form of radioactive imaging, poses heightened sensitivity to neonates, particularly premature infants. Despite this, its distinct advantages in lung structure assessment, coupled with significantly lower radiation doses compared to CT scans, maintain bedside CXR as a crucial and versatile imaging modality for neonatal lung evaluation10. However, interpreting neonatal CXR images demands specialized training, and the diagnostic yield is constrained, and prone to subjective interpretation. Imaging remains a rapidly evolving field in clinical research, increasingly integral in medical imaging diagnosis and prognosis. It aids radiologists in extracting quantitative features from images, offering novel avenues for imaging diagnostics and prognostics5,6. This methodology has also found application in numerous studies concerning neonatal and fetal imaging. Prayer et al. utilized radiomics techniques to extract microstructural and morphometric fetal lung characteristics from MRI, facilitating the analysis of expected lung capacity to predict postnatal abnormalities and optimize treatment for future lung development issues33. Standard ultrasound images effectively capture fetal lung texture, and their fusion with AI offers a novel approach to predicting prenatal respiratory diseases by analyzing image flow characteristics34.

Currently, limited imaging research is conducted on neonatal CXR images. This limitation arises from the scarcity of CXR data for neonates, especially premature infants. Additionally, the challenges posed by neonatal photography, including the infants’ smaller body size and tendency to move or cry during imaging, result in suboptimal image quality. Incorrect positioning and increasing artifact cases further complicate the standardization of image quality, hindering effective statistical analysis. Image heterogeneity can influence the diagnosis made by imaging physicians, leading to potential subjective bias. Extracting synthesized images of pertinent areas of interest for diagnostic classification can mitigate observer bias to some degree. Hence, establishing a reliable model for assembling image characteristics is crucial for enhancing the accuracy of human intelligence diagnosis of NRDS.

The overlap of ribs and lungs in neonatal chest imaging is the most prominent structural overlap in this region and occurs relatively uniformly. We posit that suppressing rib images notably enhances image stability and data reliability. Although dual-energy subtraction radiography can provide bone and soft tissue images separately, reducing diagnostic inaccuracies, the dual-exposure method increases radiation exposure. Additionally, this technique necessitates specialized, costly equipment, limiting its availability to only a few hospitals35.

In recent years, AI has seen widespread application in medical imaging research, demonstrating notable efficacy36,37,38,39. Building upon the Attention U-Net model, we trained a rib suppression model using an adult CXR dataset and successfully applied it to perform rib suppression on neonatal chest X-rays. The specific results are shown in Figs. 3 and 4. Moreover, the ROC demonstrated superior performance of the geometric model trained on rib-suppressed data compared to the primary dataset. Notably, CXR images exhibited enhanced geometric features post-rib suppression, resulting in improved diagnostic model performance (Figs. 6 and 7), with heightened sensitivity and heterogeneity.

This study has several limitations. Our data originate from a single center, and multi-center data are needed to validate the generalizability of the machine learning model; additionally, the sample size of this study is small, and a larger sample size is necessary to help improve the stability of the model. These limitations highlight the necessity for further research. In the future, we will continue to address these issues to enhance the stability and generalizability of our research findings.

Conclusion

The neonatal BSI-based model has the potential to clinically aid radiologists in diagnosing NRDS in premature infants, thereby expediting therapeutic development and prognostic assessments to mitigate NRDS-related complications. Currently, clinical diagnosis and management of NRDS heavily rely on clinical expertise. Our research indicates that a combined model integrating radiological features and clinical indicators offers promising adjunctive diagnostic utility, aiding primary neonatal practitioners in NRDS assessment. Further applications may involve assessing NRDS treatment efficacy in premature births and predicting adverse outcomes.