Introduction

With the widespread application of low-dose computed tomography (LDCT) in lung disease screening, the detection rate of pulmonary nodules has significantly increased. However, approximately 95% of screen-detected nodules are ultimately confirmed as benign1. Based on imaging characteristics, pulmonary nodules can be classified into solid nodules and subsolid nodules (SSNs)2. Studies have shown that solid nodules can typically be deemed benign if they remain stable during a 2-year follow-up period2, whereas SSNs, including pure ground-glass nodules (PGGNs) and partial solid nodules (PSNs), carry a higher malignant risk and exhibit more indolent biological behavior, often necessitating extended surveillance2,3,4. This “high detection-low malignancy” paradox creates two critical challenges: on one hand, excessive follow-up subjects patients to cumulative radiation exposure and persistent psychological anxiety5; on the other hand, the malignant potential and indolent growth characteristics of SSNs demand individualized monitoring strategies. Notably, when SSNs are located in the subpleural region, their clinical significance becomes particularly critical, once visceral pleural invasion occurs, patient prognosis deteriorates significantly6. Therefore, accurately identifying the malignant transformation propensity of subpleural SSNs during early follow-up has emerged as a key clinical breakthrough for optimizing early diagnosis and treatment of lung cancer.

In current clinical practice, the differentiation of benign and malignant SSNs primarily relies on CT morphological assessment and growth kinetic analysis. Previous studies have identified multiple risk factors associated with malignant transformation of pulmonary nodules, including morphological features (nodule size, density heterogeneity, lobulation or vacuole signs), peri-nodular structural changes (pleural retraction, vascular convergence), and clinical high-risk factors (advanced age, smoking history, family history of lung cancer)7,8. However, smaller SSNs often lack specific conventional CT signs, such as lobulation, spiculation, or pleural retraction8. Dynamic follow-up typically monitors growth rates to aid differentiation, as malignant nodules tend to exhibit faster volumetric growth or increased solid component proportions9. However, the interpretation of these indicators is susceptible to observer experience and equipment parameters, and relies heavily on longitudinal follow-up data. Additionally, a prospective study by Sawada et al10.demonstrated that up to 95% of malignant SSNs develop solid components within a 3-year follow-up period, underscoring the prognostic value of evaluating nodule growth or stability. There is an urgent need for refined dynamic risk assessment and rational follow-up strategies for subpleural SSNs.

Radiomics technology, which extracts high-throughput texture, shape, and high-order heterogeneity features from CT images, can overcome the limitations of traditional visual evaluation11. Studies have confirmed that CT-based radiomics models demonstrate significant advantages in distinguishing benign from malignant pulmonary nodules12, grading the invasiveness of lung adenocarcinoma13, and predicting the growth of ground-glass nodules14. However, existing research predominantly focuses on general SSNs and has not yet fully considered the unique risks associated with the subpleural region. There is a pressing need to establish a comprehensive predictive framework specifically tailored to subpleural SSNs.

This retrospective study aims to integrate radiological features with radiomics features using 3-year dynamic follow-up data of subpleural SSNs, to develop a predictive model for their growth trends and guide individualized follow-up strategies.

Materials and methods

Study design and population

This retrospective study collected chest CT imaging data from the Picture Archiving and Communication System (PACS). Cases were screened by searching radiology reports for keywords related to “subsolid nodules (SSNs)” (e.g., “subpleural,” “ground-glass opacity,” “subsolid nodule,” “partial solid nodule”). Two radiologists (with 7 and 6 years of experience in thoracic imaging, respectively) independently evaluated the cases.

Inclusion criteria

1. Subpleural nodules, defined as lesions within 10 mm of the pleura but not in direct contact15; 2.Persistent SSNs confirmed by at least 3 months of follow-up CT; 3.Follow-up duration ≥ 3 years, or < 3 years with evidence of SSNs growth; 4.Initial CT report indicating an average SSN diameter of 6–15 mm; 5.Solid component ratio (CTR) < 0.25 for partial solid nodules (PSNs);6. Thin-slice CT images (slice thickness ≤ 1.25 mm).

Exclusion criteria

(1). Non-subpleural or non-subsolid nodules; (2). Absence of follow-up CT or stable nodules with follow-up duration < 3 years; (3). Nodules without growth but treated with ablation or surgery during follow-up; (4). Poor-quality CT images; (5). Iii-defined SSN boundaries, making accurate delineation difficult.

A total of 494 SSNs from 454 patients were included and randomly divided into a training set (n = 345) and a test set (n = 149) in a 7:3 ratio.

CT image acquisition

The following CT scanners were utilized: uCT528, uCT550, uCT960 (United Imaging Healthcare), GE Revolution CT (GE Healthcare), Philips Brilliance iCT (Philips Healthcare) and SIEMENS SOMATOM go.Fit CT (Siemens Healthineers). Detailed scan parameter information is provided in Supplementary Table 1.

SSN growth assessment criteria

(a). An increase in the average diameter of the entire nodule by ≥ 2 mm; (b). An increase in the solid component of PSNs by ≥ 2 mm; (c). The appearance of a new solid component within pure ground-glass nodules (PGGNs). All other cases were considered stable16.

Radiological feature evaluation

Two radiologists (J.L.Z., with 7 years of experience in thoracic imaging, and H.L., with 6 years of experience) analyzed and recorded the following radiological features for each CT case, blinded to the growth status of the SSNs:

  1. (a)

    Nodule Type (partial solid nodule, PSNs / pure ground-glass nodule, PGGNs);

  2. (b)

    Morphology (round/oval or irregular);

  3. (c)

    Pleural retraction (Yes or No);

  4. (d)

    Diameter (average of the maximum length and width on the plane showing the largest nodular area)17.

Diameter change was calculated as the difference between the last and baseline CT measurements. Both diameter change and baseline CT-measured diameter were assessed via intraclass correlation coefficient (ICC). The remaining radiological features and the appearance of solid components were evaluated using Cohen’s Kappa. Any discrepancies were arbitrated by a third expert (C.X.Y., with 9 years of experience in thoracic imaging). Arbitration results were used for final analysis but were excluded from agreement calculations. Detailed inter-observer agreement results are in Supplementary Table 2.

Radiomics feature extraction and selection

Nodule segmentation

A flowchart of radiomics process is shown in Fig. 1. The volume of interest (VOIs) for subsolid nodules (SSNs) were manually delineated using ITK-SNAP software (version 3.8.0) by a radiologist with 3 years of experience in thoracic CT diagnosis, and another radiologist reviewed and adjusted the results as necessary. The delineation was performed on lung window settings (window width: 1500 HU; window level: –500 HU), avoiding blood vessels and bronchi.

Fig. 1
figure 1

Technical workflow of the study. A schematic diagram illustrating the stepwise methodology, including data acquisition, feature extraction, feature selection, model construction, and clinical application.

Feature extraction

Radiomics features were extracted using the PyRadiomics open-source Python package (version 3.7.12, https://pyradiomics.readthedocs.io). Features were extracted from CT images, including the original images and those processed with wavelet transform and Laplacian of Gaussian (LoG) filtering. Image preprocessing included resampling to a uniform voxel size (1 mm × 1 mm × 1 mm) and z-score normalization.

Feature selection

Only radiomics features with high inter-observer agreement (ICC > 0.75) were retained for further analysis. The Mann–Whitney U test (p < 0.1) was used to identify features with significant differences between the growth and stable groups. Spearman’s rank correlation coefficient was calculated, and if the absolute correlation coefficient between two features was ≥ 0.9, the feature with better diagnostic performance was retained. Hierarchical clustering was applied to feature clusters with correlation coefficients > 0.95, and representative features were selected based on the maximum dynamic range principle. LASSO regression modeling was performed with tenfold cross-validation to determine the optimal regularization parameter λ (minimizing binomial deviance), and non-zero coefficient features were selected to construct a weighted linear combination for the radiomics score (Radscore). Synthetic Minority Over-sampling Technique (SMOTE) is applied only to the training set to address the significant class imbalance.

Model construction

Radiological model Independent predictors were screened using univariate and multivariate logistic regression to construct a radiological prediction model.

Radiomics model A radiomics score (Radscore) was generated using logistic regression based on the selected radiomics features.

Combined model A nomogram prediction model was constructed by integrating radiomics features with radiological features.

Performance evaluation

The area under the receiver operating characteristic (ROC) curve (AUC) was calculated, and the DeLong test was used to compare differences between models. Calibration curves and the Hosmer–Lemeshow test were used to assess model calibration. Decision curve analysis (DCA) was performed to evaluate clinical net benefit.

Propensity score matching validation

To control for baseline confounding, mitigate risks of circular reasoning, and verify the model-predicted independence, 1:1 propensity score matching was carried out on the test set (n = 40). The matching caliper was set at 0.03, with covariates including all radiological features, including diameter, nodule type, morphology, and pleural retraction.

Statistical analysis

All data analyses were conducted on the Python 3.7.12 platform and SPSS 26.0 (IBM Corporation, Armonk, NY, USA). Statistical tests were performed using Statsmodels 0.13.2, radiomics feature extraction was based on PyRadiomics 3.0.1, and machine learning models (e.g., SVM) were implemented using Scikit-learn 1.0.2. Categorical variables were analyzed using the chi-square test, and continuous variables were analyzed using the t-test or Kruskal–Wallis test. The diagnostic performance of the models was evaluated using ROC curves, and the DeLong test was used to assess statistical differences in AUC values between models. Calibration curves and the Hosmer–Lemeshow goodness-of-fit test were used to evaluate model calibration, and decision curve analysis (DCA) was used to quantify clinical net benefit. The significance threshold was set at p < 0.05.

Results

Patient baseline characteristics and radiological feature analysis

This study included 494 subpleural subsolid nodules (SSNs), including a training set of 345 cases (age: 57.62 ± 11.47 years in the growth group vs. 56.62 ± 11.19 years in the stable group) and a test set of 149 cases (age: 56.02 ± 13.77 years in the growth group vs. 55.49 ± 10.87 years in the stable group). In the training set, significant differences were observed between the growth and stable groups in nodule diameter (9.00 ± 1.65 mm vs. 8.08 ± 1.84 mm, P < 0.001), nodule type (58.33% PSNs vs. 32.32% PGGNs, P < 0.001), morphology (54.17% irregular shape vs. 28.28%, P < 0.001), and pleural retraction (58.33% vs. 39.39%, P = 0.021). Age and sex showed no significant differences between groups (P > 0.05). Similar trends were observed in the test set, but sex and pleural retraction differences were not significant (P > 0.05). Table 1 summarizes the demographic and radiological characteristics of the patients.

Table 1 Demographic and radiological features of patients with SSN.

Univariate and multivariate logistic regression analysis

Univariate analysis identified age (OR 0.969, P < 0.001), nodule diameter (OR 0.814, P < 0.001), nodule type (PSN vs. pGGN: OR 0.292, P < 0.001), irregular morphology (OR 0.31, P < 0.001), and pleural retraction (OR 0.239, P < 0.001) as significant predictors of growth risk. In multivariate analysis, age (OR 0.961, P < 0.001), nodule type (PSN: OR 2.359, P = 0.009), irregular morphology (OR 2.917, P = 0.001), and pleural retraction (OR 2.227, P = 0.014) remained independent predictors, while nodule diameter was no longer significant (P > 0.05). Details are provided in Table 2.

Table 2 Univariate and multivariate logistic regression analysis for clinical radiological features in the training set.

Key radiomics features

Feature selection identified 12 key features contributing to RadScore. They were extracted from CT images via three methods: 2 from original images, 6 from wavelet-transformed images, and 4 from LoG—processed images. Here are the features and their coefficients:

RadScore = 0.13913 + 0.017287 × log_sigma_3_0_mm_3D_glszm_SizeZoneNonUniformityNormalized + 0.035857 × log_sigma_4_0_mm_3D_firstorder_Kurtosis + 0.000366 × log_sigma_4_0_mm_3D_glrlm_ShortRunLowGrayLevelEmphasis + 0.000757 × log_sigma_5_0_mm_3D_glszm_SmallAreaHighGrayLevelEmphasis + 0.015153 × original_shape_Maximum2DDiameterRow − 0.073767 × original_shape_Sphericity + 0.001821 × wavelet_HHH_glszm_GrayLevelNonUniformity − 0.022003 × wavelet_HLL_firstorder_Skewness + 0.008806 × wavelet_HLL_gldm_LargeDependenceHighGrayLevelEmphasis + 0.007341 × wavelet_LHH_firstorder_Skewness − 0.019264 × wavelet_LHL_firstorder_Median + 0.002791 × wavelet_LLL_glcm_JointEnergy.

Each feature’s coefficient in the formula shows its impact on RadScore. A positive coefficient means a higher feature value increases RadScore, while a negative one implies the opposite. Figure 2 visualizes the features and their weights.

Fig. 2
figure 2

Radiomics features and their weights derived from CT images.

Model performance comparison

In the training set, the combined model, radiomics model, and radiological model achieved AUCs of 0.896 (95% CI 0.8505–0.9425), 0.857 (95% CI 0.7976–0.9169), and 0.716 (95% CI 0.6354–0.7974), respectively. The DeLong test demonstrated the superiority of the combined model over the radiomics model (0.896 vs. 0.857, P = 0.047) and radiological model (0.896 vs. 0.716, P < 0.001). The combined model exhibited the highest accuracy (87.2%) and specificity (89.6%), while the radiomics model showed the highest sensitivity (83.3%). Both outperformed the radiological model across all metrics.

In the test set, the combined model, radiomics model, and Radiological model achieved AUCs of 0.842 (95% CI 0.7185–0.9600), 0.840 (95% CI 0.7444–0.9346), and 0.741 (95% CI 0.5981–0.8829), respectively. The combined model outperformed the radiological model (0.842 vs. 0.741, P = 0.035) but showed no significant difference compared to the radiomics model (0.842 vs. 0.840, P = 0.936). The combined model demonstrated the highest accuracy (83.3%), sensitivity (75%), and specificity (84.5%). Detailed performance metrics for all models in both training and test sets are presented in Table 3 and Fig. 3.

Table 3 The performance of the radiological model, radiomics model, and combined model in the training set and test set.
Fig. 3
figure 3

ROC curve comparisons and DeLong test results for the radiological, radiomics, and combined models in the training and test sets. The combined model demonstrated comparable or better discriminative performance (training AUC 0.896; test AUC 0.842) compared to radiological (AUC 0.716/0.741) and radiomics (AUC 0.857/0.840) models.

Model calibration and clinical utility

Calibration curve analysis

The Hosmer–Lemeshow test revealed good calibration for all models in the training set (radiological model: P = 0.051; radiomics model: P = 0.062; combined model: P = 0.080), indicating consistency between predicted probabilities and observed outcomes. In the test set, the combined model maintained good calibration (P = 0.167), while the radiological model (P = 0.021) and radiomics model (P = 0.044) exhibited mild calibration deviations (Fig. 4).

Fig. 4
figure 4

Calibration curve of different models in training and test sets. The predicted probability of the combined model in training and test sets was highly consistent with the actual probability, indicating good model calibration.

Decision curve analysis (DCA)

Decision curve analysis demonstrated that the combined model provided significantly higher net clinical benefit than the “treat all,” “treat none,” radiological model, and radiomics model across the threshold probability range of 10–90% in the test set (Fig. 5). A nomogram integrating radiomics scores and radiological features is illustrated in Supplementary Fig. 1.

Fig. 5
figure 5

Decision curve analysis (DCA) of radiological, radiomics, and combined models in the training and test sets. The combined model achieved the highest net clinical benefit across the 10–90% threshold probability range, supporting its utility in guiding clinical decisions.

Propensity score matching validation results

Covariates including diameter, nodule type, morphology, and pleural retraction, were incorporated for propensity score matching. After matching, all intergroup differences were no longer statistically significant (all P > 0.05). Model performance changed accordingly: the AUC of the radiological model decreased from 0.741 to 0.591, while that of the radiomics model increased from 0.840 to 0.865. The combined model maintained good performance despite its AUC decreasing slightly from 0.842 to 0.785. Details can be found in Supplementary Tables 34 and Supplementary Figs. 25.

Discussion

We developed and validated a radiological-radiomics combined model by integrating radiological features and radiomics parameters to predict the growth and long-term stability of subpleural subsolid nodules (SSNs). The combined model demonstrated favorable performance in both the training and test sets (Fig. 3).

Multivariate logistic regression identified nodule type (partial solid nodule, PSNs), irregular morphology, and pleural retraction as independent predictors of nodule growth. Specifically, PSNs exhibited a significantly higher malignant risk compared to pure ground-glass nodules (PGGNs), consistent with the established correlation between the solid component ratio (CTR) and invasive pathological features18. Irregular morphology may reflect heterogeneous growth patterns caused by tumor cell proliferation7,19, while pleural retraction suggests fibrotic adhesion between the nodule and pleura, potentially indicating early pleural invasion20. Notably, although nodule diameter was associated with growth risk in univariate analysis, its significance disappeared in the multivariate model, possibly due to multicollinearity. Moreover, larger nodules (Diameter > 15 mm) are typically managed surgically rather than with surveillance, potentially explaining the non-significance of diameter in the multivariate model. The validity of size as a predictor of nodule growth remains inconclusive and warrants further investigation14,15,16,17,18,19.

Although some traditional CT features were identified, the performance of the radiological model was suboptimal in both the training and test sets, highlighting the necessity of radiomics for this task, especially for small, feature-ambiguous nodules. Figure 6 illustrates how the radiomics model and combined model can assist clinical judgment. The radiomics features demonstrated both consistency and complementarity with traditional radiological predictors. Key features aligned with established risk indicators: original_shape_Sphericity (negative coefficient) suggested reduced growth tendency in spherical nodules, consistent with radiological observations linking irregular morphology to progression. Complementarily, wavelet-transformed features captured subtle heterogeneity (e.g., wavelet_HHH_glszm_GrayLevelNonUniformity with positive coefficient), potentially reflecting microstructural alterations beyond visual assessment21. LoG-filtered features such as log_sigma_5_0_mm_3D_glszm_SmallAreaHighGrayLevelEmphasis may indicate focal tissue changes warranting further pathological correlation. Furthermore, after eliminating baseline differences through propensity score matching, both the radiomics model and combined model maintained robust performance (AUC > 0.75), demonstrating sustained predictive capability independent of the baseline differences in these specific radiological features.

Fig. 6
figure 6

Two patients with ground-glass nodules (GGNs). (AC) A man (age, 66 years) with a pure GGN in the right lower lobe. First found on May 28, 2020, the 9 mm diameter nodule had an irregular shape and no pleural retraction. Growth probabilities predicted by radiological, radiomics, and combined models were 0.42, 0.14, and 0.21. No obvious change occurred in 3-years follow-up. (DF) A woman (age, 28 years) with a pure GGN in the right upper lobe. Detected on February 11, 2023, the 6 mm diameter nodule had no malignant signs. Predicted growth probabilities were 0.29 (radiological), 0.69 (radiomic), and 0.71 (combined). Follow-up CT showed GGN enlargement with mild lobulation and solid components. It was surgically removed in January 2025.

In recent years, predicting the growth rate of pulmonary nodules has become a research focus. Xue et al. used CT radiomics to predict the two-year growth of 4–12 mm small pulmonary nodules16, while Sun et al. demonstrated the efficacy of a nomogram model combining clinical-radiological features and radiomics parameters for predicting the long-term stability of 5–20 mm pure ground-glass nodules14. Liao et al. developed a deep learning model to predict the growth of 5–30 mm subsolid nodules22. These studies highlight the potential of CT imaging for predicting nodule growth, with radiomics and deep learning methods showing significant value. However, no prior study has specifically applied CT radiomics to predict growth stability in subpleural SSNs. By focusing on this unique region and integrating radiomics with radiological features, we developed a clinically practical radiological-radiomics combined model. This model enables precise risk stratification to: (1) reduce unnecessary follow-up for low-risk nodules, (2) facilitate timely intervention for high-risk cases, and (3) optimize individualized surveillance protocols.

The study has several limitations. First, its retrospective design introduces inherent selection bias. Second, the small test set (n = 149) may restrict the model’s generalizability, necessitating external cohort validation. Third, manual nodule segmentation may cause inter-observer variability. Though features with ICC exceeding 0.75 were chosen to ensure robustness, incorporating automated segmentation algorithms is a future direction. Fourth, equating nodule growth with malignancy lacks pathological validation, which may lead to confounding results as benign processes can also show progression.

Conclusion

This study developed a radiological-radiomics combined model to predict growth trends of subpleural subsolid nodules (SSNs). Partial solid nodules (PSNs), irregular morphology, and pleural retraction were identified as independent predictors. The combined model demonstrated favorable diagnostic performance by synergizing macroscopic CT assessments with radiomics-driven quantification of microscopic heterogeneity. This dual-perspective approach provides an objective framework for optimizing surveillance intensity, potentially reducing unnecessary follow-up in stable subpleural SSNs.