Introduction

Non-mass breast lesions (NML) account for 9.21% of breast abnormalities1. In conventional ultrasound examinations, these lesions lack clear boundaries and do not show significant space-occupying effects in two or more scanning planes. They typically appear as localized hypoechoic areas on ultrasound images, without distinct shape or border2. In the Breast Imaging Reporting and Data System (BI-RADS), diagnostic criteria for breast NML are lacking, which increases the risk of missed or incorrect diagnoses3. Based on their conventional ultrasound features, Wang et al.4 classified breast NML into four types: hypoechoic areas within the breast, hypoechoic areas with microcalcifications, architectural distortion, and solid echoes within the ducts. However, dense breast tissue is more common among Asian women. In BI-RADS, breast tissue is categorized into5: fatty, scattered fibroglandular, heterogeneously dense (where masses may appear indistinct), and extremely dense (where the high density of breast tissue significantly reduces the sensitivity of mammography). The latter two types are considered dense breasts. According to the ultrasound classification proposed by Wu Lin et al.6, dense breasts are characterized by a duct diameter sum that is less than half of the breast’s thickness, with breast parenchyma predominantly composed of fibrous or fatty tissue, exhibiting high and uniform echo intensity with heterogeneous internal echoes. On ultrasound images, dense breasts predominantly show as areas of high or intense echo, with the parenchyma mostly consisting of proliferating fibrous tissue. The mammary glands may undergo atrophy, and ducts may become small or disappear7. The dense glandular tissue may obscure lesions, making the detection and differentiation between benign and malignant breast NML in the context of dense breast tissue particularly challenging.

In recent years, with the deepening of research, radiomics has made significant progress in the diagnosis of breast tumors, lymph node metastasis assessment, and prognosis prediction8,9. By precisely annotating medical images, delineating lesions, and applying advanced computational methods, this technology can extract quantitative data from a large volume of medical images10,11. However, most published tumor radiomics studies primarily focus on extracting radiomic features from the primary tumor site, neglecting the peritumoral radiomic features. In 1989, the “seed and soil” hypothesis was proposed12, suggesting that the tumor microenvironment, which includes tumor cells and surrounding immune cells, plays a critical role in tumor progression and metastasis through complex interactions that allow tumor cells to evade immune responses13. The study by Tan et al.14 demonstrated that a model combining texture analysis with conventional Magnetic Resonance Imaging(MRI) features had superior diagnostic performance in differentiating benign and malignant breast NML compared to using texture analysis or conventional MRI features alone. This suggests that texture analysis can quantify tumor heterogeneity and provide added value in distinguishing between benign and malignant breast NML14.

Advances in radiomics are rapidly transforming breast cancer diagnosis and treatment, particularly by enabling deeper insights into tumor microenvironment heterogeneity. Although mammography remains a primary screening tool, its dependence on morphological characteristics limits its specificity in dense breast tissue, often resulting in a high number of unnecessary biopsies15. Emerging evidence suggests that peritumoral texture analysis may help address this limitation: mammographic studies have shown that wavelet-based features of stromal heterogeneity can predict malignancy risk16, while three-compartment compositional imaging (3CB) reduces false positives by 22% through the analysis of fat-to-fibroglandular texture ratios17. These findings are consistent with preclinical research on tumor-adipocyte interactions17, which demonstrate that gap junction-mediated stromal remodeling generates detectable microenvironmental signatures. However, research on the application of ultrasound-based radiomics for diagnosing non-mass breast cancer remains scarce. Most non-mass lesions (NMLs) in the breast are abnormal, and conventional ultrasound often yields high sensitivity but low specificity, leading to a relatively high rate of missed or incorrect diagnoses for non-mass breast cancer6. Building upon these developments, we hypothesize that ultrasound radiomics—particularly peri_5mm wavelet features—can detect early stromal reprogramming in non-mass lesions, thereby addressing a crucial diagnostic gap in populations with dense breast tissue.

Methods

Pathological selection and general information

A total of 851 cases of breast NWL in a dense breast background, detected by conventional ultrasound, were included in this study between May 30, 2018, and January 20, 2024, at Dongguan People’s Hospital. Detailed information on the enrollment procedure is displayed in Fig. 1. Inclusion criteria: (I) Diagnosed as breast NML by ultrasound; (II) Diagnosed with dense breast tissue by ultrasound or mammography; (III) Patients who voluntarily underwent core needle biopsy, vacuum-assisted biopsy, or surgery, with complete clinical data and confirmed pathological diagnosis. Exclusion criteria: (I) Lesions identified as mass-type breast lesions on ultrasound; (II) Patients with non-dense breast tissue as determined by ultrasound or mammography; (III) History of neoadjuvant chemotherapy, radiotherapy, or endocrine therapy for breast diseases prior to surgery; (IV) Unclear lesion nature or pathological diagnosis; (V) Poor image quality. This study was approved by the hospital’s ethics committee of the Tenth Affiliated Hospital of Southern Medical University (Dongguan People’s Hospital) (G-2023027) and adhered to the principles of the Declaration of Helsinki, thus informed consent from patients was waived.

Fig. 1
figure 1

Flowchart of patient enrollment.

Instruments and methods

A GE Logic E9 color Doppler ultrasound system with a high-frequency linear probe (L-16–5) was used, with a mechanical index of 0.16. Ultrasound examinations were performed by radiologists with over 10 years of experience. Patients were positioned supine with their arms raised at a 90° angle above their heads to fully expose the breasts and both axillae; if necessary, patients were asked to turn to their side. For all cases identified as non-mass lesions (NMLs) within a dense breast background via conventional ultrasound, 2D images were employed to assess lesion size, internal echo homogeneity, the presence of microcalcifications, posterior acoustic attenuation, and distortion of surrounding structures. Color Doppler imaging was utilized to evaluate the vascular characteristics of the lesions. The ultrasound image representing the lesion’s longest axis was selected and exported in DICOM format, preserving full bit-depth in 16-bit grayscale, with a raw pixel value range of 0–4095, as configured in the Logic E9 system. To ensure reproducibility, all DICOM files retained their original acquisition parameters, including a dynamic range of 60 dB, time gain compensation (TGC) settings, and spatial compounding configurations. Prior to feature extraction, pixel values were linearly normalized to a [0, 1] range using min–max scaling, based on the device-specific maximum quantization units. Prior to feature extraction, pixel values were linearly normalized to a [0, 1] range using min–max scaling, based on the device-specific maximum quantization units18,19.

Image data acquisition and processing

Using a single-blind method, ultrasound images were imported into Itk-Snap20 (v4.20, http://www.itksnap.org/) by two radiologists with over 10 years of experience. The tumor boundaries were manually delineated along the edges, and the region of interest (ROI) was defined, which included spiculation, lobulation, and surrounding satellite or radial features. Using the ROI margin as the baseline, morphological adaptive segmentation was applied to expand the peritumoral regions outward by 1 mm, 2 mm, 3 mm, 4 mm, and 5 mm. This yielded images of peri_1mm, peri_2mm, peri_3mm, peri_4mm, peri_5mm, and intratumoral regions (see Fig. 2). The interclass correlation coefficient (ICC) was used to assess the consistency of feature extraction between and within observers. ICC ratings were classified as follows: less than 0.40 was considered “Poor”, 0.40–0.59 as “Fair”, 0.60–0.74 as “Good”, and 0.74–1.00 as “Excellent”.

Fig. 2
figure 2

Sketch of NML image. Ultrasound original image (A) with intratumoral mask (B), and peritumoral masks at 1 mm (C), 2 mm (D), 3 mm (E), 4 mm (F), and 5 mm (G).

Construction of the radiomics model

The delineated ROI images were imported into pyradiomics21 (v3.0.1, https://github.com/AIM-Harvard/pyradiomics), an open-source software, where image feature extraction was performed on the ultrasound images using a Wavelet filter while retaining the original images. This included morphological features, first-order features, and texture features, such as Gray Level Size Zone Matrix (GLSZM), Gray Level Co-occurrence Matrix (GLCM), Neighbouring Gray Tone Difference Matrix (NGTDM), Gray Level Dependence Matrix (GLDM), and Gray Level Run Length Matrix (GLRLM).

Extracted radiomic features were standardized using Z-score normalization. To mitigate potential multicollinearity among radiomic features, a t-test was conducted on features within the training set, retaining only those with a P-value < 0.05. Pearson correlation coefficients were computed to evaluate inter-feature relationships, and features with correlation coefficients ≥ 0.9 were considered redundant and excluded. To assess the necessity for multiple testing correction, a sensitivity analysis was additionally performed using the Bonferroni correction (α = 0.05/851). Based on the minimum squared error (MSE) criterion, least absolute shrinkage and selection operator (LASSO) regression combined with ten-fold cross-validation was applied to select the optimal subset of radiomic features, further eliminating collinear variables by assigning zero-valued coefficients. Each selected feature was multiplied by its corresponding regression coefficient, and the resulting values were summed to generate a radiomics score for each patient.

Construction and evaluation of the predictive models

Based on the selected optimal radiomic features, thirteen algorithms, including Random Forest (RF), Decision Tree (DT), Extra Trees (ET), Support Vector Machine (SVM), Logistic Regression (LR), Stochastic Gradient Descent (SGD), K-Nearest Neighbors (KNN), XGBoost, AdaBoost, Gradient Boosting Decision Tree (GBDT), CatBoost, LightGBM, and Bayes, were used to construct intratumoral and peritumoral radiomics models. Ten-fold cross-validation was applied to identify the optimal parameters for each model, and Receiver Operating Characteristic (ROC) curves were plotted to compare the Area Under the Curve (AUC) for all fourteen radiomics models.

To ensure that the performance of the multivariate model was not predominantly influenced by a single feature, univariate logistic regression was performed for each radiomic feature to evaluate its individual discriminative ability, using area under the curve (AUC) and P-value as metrics. Features with a univariate AUC greater than 0.70 were regarded as strong independent predictors. Additionally, DeLong’s test was used to compare the AUC of the optimal multivariate model with: (a) each top-performing univariate feature (AUC > 0.70) and (b) a reduced multivariate model excluding the top univariate feature. A statistically significant difference (P < 0.05) would indicate that the complete multivariate model offers superior predictive performance compared to any individual feature.

Univariate and multivariate logistic regression analyses were conducted using clinical and ultrasound features from the training group to identify independent predictors associated with NML, leading to the construction of clinical and ultrasound models. The radiomics model with the best predictive performance, combined with independent clinical and ultrasound predictors, was selected to construct a combined model. ROC curves were plotted, and the AUC, accuracy, sensitivity, and specificity of each model were calculated to evaluate their predictive efficacy for NML. Calibration curves and Decision Curve Analysis (DCA) were employed to assess the goodness-of-fit and clinical value of the models. Based on clinical and ultrasound independent predictors, as well as the optimal radiomics model, regression coefficients (β-values) and odds ratios (ORs) for each variable were calculated. A multivariate logistic regression analysis was then conducted to construct the nomogram.

Statistical methods

Statistical analyses were performed using R (version 4.2.0, https://www.R-project.org) and Python (version 3.10, https://www.python.org). Model construction and hyperparameter tuning were conducted in Python using the “scikit-learn” (version 1.4.1) and “XGBoost” (version 2.0.3) libraries. Hyperparameter optimization and cross-validation were implemented via “GridSearchCV” to minimize overfitting and improve model accuracy. Ten-fold cross-validation was conducted exclusively within the training cohort for feature selection, hyperparameter tuning, and model optimization. Final performance metrics were evaluated exclusively on the independent test cohort, which was not involved in any phase of model development.

XGBoost is an ensemble tree technique based on the loss produced by weak decision tree learners. XGBoost was trained as a baseline model, followed by training the final model with optimized hyperparameters. The hyperparameters of the XGBoost model were tuned using the GridSearchCV function in Scikit-learn, with ten-fold cross-validation performed. The selected hyperparameters for optimization included learning_rate, gamma, max_depth, subsample, min_child_weight, and n_estimators. The GridSearchCV method in Scikit-learn with ten-fold cross-validation was also used to tune the hyperparameters of RF, DT, ET, SVM, LR, SGD, KNN, XGBoost, AdaBoost, GBDT, CatBoost, LightGBM, and Bayes.

Categorical data are presented as percentages (%) and comparisons were performed using the chi-square (χ2) test. Continuous data were evaluated for normality using the Kolmogorov–Smirnov test. Normally distributed data are presented as mean ± standard deviation (\(x \pm s\)) and compared using the Student’s t-test. Pearson correlation analysis was used to select relevant radiomic features. Receiver operating characteristic (ROC) curves were used to assess the predictive performance of each model, and the DeLong’test was applied to compare their relative performances. A nomogram prediction model was constructed, with the optimal threshold determined through ROC curve analysis. The threshold was defined by maximizing the Youden index. Based on the optimal cutoff values, the area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were calculated for each model. Calibration curves were employed to evaluate model consistency, and decision curve analysis (DCA) quantified the net benefits at various threshold probabilities to assess the clinical significance of the models. The significance level was set at α = 0.05 (two-tailed).

Results

Patient demographics

A total of 851 cases of breast NWL meeting the inclusion and exclusion criteria were identified from May 30, 2018, to January 20, 2024, with postoperative pathological results showing 453 malignant cases and 398 benign cases. Among these, the training group comprised 327 malignant and 269 benign patients, while the validation group included 126 malignant and 129 benign patients. The chi-square statistic was 1.920, P = 0.166, indicating no significant statistical difference. There were no significant differences in clinical and ultrasound characteristics between the two groups (P > 0.05; Supplementary Table S1). Detailed clinical and ultrasound characteristics of the patients are presented in Table 1. Results of the multivariate analysis are detailed in Supplementary Tables S2 and S3. There were significant correlations between patient age, weight, lesion location, lesion longest diameter, posterior echo attenuation, surrounding structural disruption, and blood flow characteristics with non-mass type breast cancer (P < 0.05). No significant differences were observed for other clinical and ultrasound characteristics (P > 0.05).

Table 1 Comparison of clinical and ultrasound data of benign and malignant patients with NML between training group and validation group.

The above clinical and ultrasound independent predictive factors were incorporated into logistic regression to establish the clinical model and ultrasound model, which were evaluated using ROC curves (see Supplementary Fig. S1 online). The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of these models are detailed in Supplementary Table S4.

Extraction, selection, and construction of radiomics features

A total of 851 radiomics features were extracted from the ultrasound images of patients with breast NWL using radiomics technology. Feature selection was performed using student’s t-test, Pearson correlation test, and LASSO (see Supplementary Fig. S2 online). Correlation heatmaps of the peritumoral (1–5 mm) and intratumoral regions (see Supplementary Fig. S3 online) revealed distinct collinearity patterns. The results showed that after feature selection, the radiomic features exhibited low correlation (|r|< 0.6), confirming the minimization of residual collinearity. Additionally, Bonferroni correction reduced the number of significant features in each region by an average of 8.1%. Importantly, LASSO regression applied to both corrected and uncorrected feature pools yielded the same final predictive factors (Supplementary Table S5). This suggests that multivariate regularization inherently resolves the potential false-positive issues in univariate selection, thus negating the need for additional corrections. The optimal radiomics features were obtained from the peri-lesional (1 mm, 2 mm, 3 mm, 4 mm, 5 mm) and intra-lesional models with corresponding Lambda values of 0.009, 0.004, 0.002, 0.002, 0.001, and 0.003, respectively. The results are detailed in Supplementary Table S6.

Thirteen algorithms, including RF, DT, ET, SVM, LR, SGD, KNN, XGBoost, Adaboost, GBDT, CatBoost, LightGBM, and Bayes, were used to construct the ultrasound radiomics models. The results are detailed in Table 2 and Supplementary Tables S7–S11. The predictive task aimed to differentiate between benign and malignant non-mass breast lesions (NML). As shown in the ROC curves (see Supplementary Fig. S4 online), the intratumoral model (RS_in_XGBoost) achieved an AUC of 0.842 (95% CI 0.788–0.896), while peri-lesional models exhibited varying performance: the 1 mm peri-lesional model (RS_peri_1mm_XGBoost) yielded an AUC of 0.819 (95% CI 0.762–0.876), followed by the 2 mm peri-lesional model (RS_peri_2mm_RF) with an AUC of 0.799 (95% CI 0.739–0.858). Notably, higher performance was observed in the 3 mm (RS_peri_3mm_XGBoost: AUC = 0.825, 95% CI 0.769–0.882), 4 mm (RS_peri_4mm_SVM: AUC = 0.904, 95% CI 0.860–0.948), and 5 mm (RS_peri_5mm_LR: AUC = 0.876, 95% CI 0.828–0.926) peri-lesional regions, with the 4 mm SVM-based model demonstrating the highest discriminative capability.

Table 2 Performance of intratumoral radiomics models in the validation group.

Univariate analysis and multivariate model robustness

Univariate logistic regression identified 26, 36, 60, 73, 96, and 45 radiomic features with significant discriminative power in the peritumoral (1–5 mm) and intratumoral regions (P < 0.05; Supplementary Tables S12–S17). The DeLong’test indicated that the optimal multivariate model for each region significantly outperformed all the best-performing univariate features (P < 0.05), confirming that no single feature dominated the model.

Feature analysis of radiomics model

Seven “intersection features” were identified from the overlap of 26, 36, 60, 73, 96, and 45 imaging features extracted from the ultrasound images of the aforementioned different ROI regions: “wavelet-HLL_gldm_LargeDependenceHighGrayLevelEmphasis”, “original_glcm_JointAverage”, “original_firstorder_Minimum”, “wavelet-LLH_firstorder_Kurtosis”, “wavelet-HLH_glrlm_RunVariance”, “wavelet-LHH_glrlm_RunVariance”, and “wavelet-LLL_gldm_LargeDependenceHighGrayLevelEmphasis”. Among these features, the effects observed in the intratumoral and peritumoral 4 mm imaging models were oppositional, particularly for “wavelet-HLH_glrlm_RunVariance” and “wavelet-LHH_glrlm_RunVariance”. In the intratumoral model, smaller values of these two features were associated with a higher probability of predicting non-massive breast cancer, whereas in the peritumoral 2 mm model, larger values were linked to a higher probability of the same prediction. While “wavelet-HLL_gldm_LargeDependenceHighGrayLevelEmphasis” and “wavelet-LLL_gldm_LargeDependenceHighGrayLevelEmphasis” have a minimal impact on the intratumoral model predictions, in the peritumoral 2 mm model, a smaller value of “wavelet-HLL_gldm_LargeDependenceHighGrayLevelEmphasis” and a larger value of “wavelet-LLL_gldm_LargeDependenceHighGrayLevelEmphasis” significantly influence the predictions. While “wavelet-HLL_gldm_LargeDependenceHighGrayLevelEmphasis” and “wavelet-LLL_gldm_LargeDependenceHighGrayLevelEmphasis” had minimal impact on the intratumoral model predictions, in the peritumoral 2 mm model, a smaller value of “wavelet-HLL_gldm_LargeDependenceHighGrayLevelEmphasis” and a larger value of “wavelet-LLL_gldm_LargeDependenceHighGrayLevelEmphasis” significantly influenced the predictions.

Figure 3 visualizes the feature selection results of the optimal radiomics models for the six ROI groups, where the six red nodes represent the selected features for each group’s model. Notably, the seven light pink nodes were consistently selected by the best radiomics models across all six ROI groups, indicating their robustness and potential significance in model predictive performance. In contrast, features such as “wavelet-LLH_glszm_SizeZoneNonUniformity” were only selected by the peri-lesional 5 mm radiomics model, suggesting that their importance may be specific to that model. A partial dependence plot was created for the aforementioned seven “intersection features” based on the intra-lesional and peri-lesional 4 mm radiomics models.

Fig. 3
figure 3

Feature analysis of Radiomics model. (A) Light-colored nodes indicate the best radiomics features for the respective ROIs, while purple nodes represent features shared across multiple models. The edges color-coded by model indicate which features were selected by which models. (B) and (C) illustrate the partial dependence of the model on three features (wavelet-HLL_gldm_LargeDependenceHighGrayLevelEmphasis, original_firstorder_Minimum, wavelet-LHH_glrlm_RunVariance). Each axis represents the range of a feature value, and the color of the points reflects the model’s predictions. The color gradient allows for a visual assessment of the combined effect of these three features on the predictions.

Nonlinear spatial dynamics of radiomics signatures

The coefficient magnitude (β) displayed a triphasic pattern across peritumoral regions: a sharp decline from intratumoral (β = 0.053) to peri_1mm (β = 0.029, Δ = 45.3%), followed by further decay in peri_2mm (β = 0.011, Δ = 62.1%), and stabilization beyond peri_3mm (β ≈ 0.02). However, the gradient decay index (GDI) exhibited an inverse trend: the maximal signal shift occurred between peri_1mm and peri_2mm (GDI = 0.818), while peri_3mm and beyond showed no measurable decay (GDI = 0) (Supplementary Table S18). This paradox suggests distinct biological regimes: early-stage tumor-stroma interactions dominate the peri_1-2 mm region (high GDI), while more distant regions (peri_3-5 mm) reach a steady-state microenvironment, homogenized by systemic immune regulation or mechanical constraints.

Effectiveness evaluation of combined model

All model comparisons are detailed in Supplementary Table S19, showing that the combined model outperformed other individual models (P < 0.01), with accuracy, sensitivity, specificity, positive predictive value, and negative predictive value detailed in Table 3.

Table 3 Performance of combined models in the validation group.

The ROC curves for the six combined models show that the joint model constructed with intra-lesional radiomics, clinical, and ultrasound independent predictors had the best predictive performance, with an AUC of 0.858 (95% CI 0.778–0.939). The next best was the combined model using peri-lesional 2 mm radiomics, with an AUC of 0.850 (0.767–0.932). The calibration curve indicates that this model has good fit in predicting TNBC (P > 0.05). The DCA curve shows that the combined model using the peri-lesional 2 mm radiomics and clinical, ultrasound independent predictors had higher clinical net benefit within the risk threshold range of 0.069 to 0.966, with an Area Under the Decision Curve Analysis (AUDC) of 0.294 (95%: −0.530–0.549). The second-best was the combined model with intra-lesional radiomics, which had an AUDC of 0.282 (95%: −0.306–0.297) (see Fig. 4).

Fig. 4
figure 4

Performance of combined models in the validation group. The optimal radiomic models for the validation group, including the intratumoral and peritumoral areas at 1 mm, 2 mm, 3 mm, 4 mm, and 5 mm, were combined with independent clinical and ultrasound predictors to obtain the ROC curve (A), P–R curve (B), DCA curve (C), and calibration curve (D). ROC: receiver operating characteristic, AUC: area under curve, CI: confidence interval, P-R: precision- recall, DCA: decision curve analysis.

The nomogram constructed using the intra-lesional radiomics model and clinical, ultrasound independent predictors had a diagnostic performance C-index of 0.982. When the nomogram score threshold is 167.808, the optimal classification prediction model can be obtained. A score greater than 167.808 predicts non-massive breast cancer, while a score less than 167.808 predicts non-mass benign breast lesions. The nomogram constructed using the peri-lesional 2 mm radiomics model and clinical, ultrasound independent predictors had a diagnostic performance C-index of 0.978. When the nomogram score threshold is 110.745, the optimal classification prediction model can be obtained. A score greater than 110.745 predicts non-massive breast cancer, while a score less than 110.745 predicts non-mass benign breast lesions (see Fig. 5).

Fig. 5
figure 5

Nomograms were established based on clinical and ultrasound independent predictors combined with the intratumoral (A) and peritumoral 2 mm (B) radiomic models in the validation group. The predictors included clinical and ultrasound independent factors as well as the radiomics score. A vertical line was drawn from each predictor to the “score” to obtain the score for each predictor. The scores for each predictor were then summed. The “total score” corresponds to the probability of the malignancy of NML.

Models display and application

To facilitate the application of our findings for clinicians, researchers, patients, and their families, we have developed this diagnostic prediction system, which can be accessed at the following website: https://jarrily9527nml.streamlit.app/.

Discussion

Compared to breast mass lesions, the detection and differentiation of benign and malignant NWL are more challenging. According to recent literature statistics22,23, benign NWL accounts for 53.8%, while malignant NWL constitutes 46.2%. There is an overlap in the ultrasound features of NWL, with structural distortion and ductal changes being common characteristics24,25,26. Especially in the context of dense breast tissue, the denser the fibroglandular tissue, the more the lesions overlap with it, making the differentiation of benign and malignant NWL even more difficult.

In this study, age is identified as a risk factor for malignant NWL. Previous research has indicated that advancing age is often associated with changes in metabolism and immune function, the accumulation of oncogenes, and insufficient DNA methylation, all of which are closely related to the occurrence of breast cancer27. The length of the lesion, the presence of microcalcifications, structural distortion, and blood flow characteristics are independent predictive factors for malignant lesions. Additionally, blood flow characteristics are an independent risk factor; malignant lesions typically exhibit blood flow signals rated at 2 to 3, whereas benign lesions more frequently present signals rated at 0 to 1. Studies have shown that during the invasive growth of breast tumors, tumor cells grow rapidly, and the marginal area experiences hypoxia, which promotes the growth of microvessels within the lesion and the formation of a microvascular network28. This study found that structural distortion in the breast is also considered an important factor associated with malignancy. However, this differs from previous research29, possibly due to differences in the pathological types of the study subjects. While structural distortion and ductal changes are common features of both benign and malignant NWL, structural distortion is more prevalent in the ultrasound images of malignant lesions compared to benign ones30.

High-frequency breast ultrasound has been widely used in the screening and diagnosis of breast cancer, significantly increasing the detection rate of breast cancer in dense breast tissue31. However, the ultrasound model in this study demonstrated low diagnostic performance, confirming the poor ultrasound diagnostic capability for NML in dense breast tissue. The detection rate is low due to the ultrasound features of NWL differing from those of typical mass-type breast cancer. Previous studies have described NML as regions of diffuse echotexture changes that differ from the surrounding breast tissue25. In dense breast tissue, the denser the fibroglandular tissue, the more the lesions overlap with it, which increases the difficulty of detecting non-mass breast cancer. Ultrasound examination of superficial small organs such as the breast primarily relies on the operator’s technique and experience. As a result, the differentiation of benign and malignant breast NWL is influenced by subjective factors such as the reporting physician’s experience. There is a need for an objective tool to improve the detection rate of non-mass breast cancer in dense breast tissue.

In recent years, imaging omics methods have been used to explore the relationship between the peritumoral imaging features of tumors and their surrounding microenvironments. Several imaging biomarkers reflecting information about the tumor microenvironment have been identified, demonstrating significant value in tumor differential diagnosis, metastasis prediction, treatment evaluation, and prognosis prediction. Sun et al.32 found that in ultrasound images of breast cancer, a lymph node metastasis prediction model combining intratumoral and peritumoral imaging features significantly outperformed models based solely on intratumoral or peritumoral features. Moreover, the model constructed using CNN achieved the best performance, clearly surpassing other models. Guo et al.33 attempted to use dual-modal images from conventional breast ultrasound and strain elastography to conduct radiomics research on benign and malignant breast lesions. They extracted imaging features from the intratumoral, peritumoral 5 mm, and “intratumoral + peritumoral” regions to construct predictive models for breast lesions. The results indicated that the model combining grayscale ultrasound, strain elastography dual-modal images, and “intratumoral + peritumoral” imaging features had the highest diagnostic performance. Furthermore, scholars such as Huang et al.34 combined photoacoustic imaging with intratumoral and peritumoral radiomics techniques to determine that a 5 mm peritumoral region is the optimal area, revealing its critical role in enhancing the predictive accuracy of malignant breast nodules when combined with intratumoral features. In a multimodal study, Liu et al.35 demonstrated that the peritumoral radiomics model (GPTV_radiomics) and deep learning model (GPTV_DL) based on multiparametric MRI showed significantly superior performance in predicting lymphovascular invasion in the peritumoral region compared to intratumoral analysis (AUC: 0.771 vs. 0.720). An MRI study by Yu et al.36 further validated this finding, with their peritumoral features predicting lymphovascular invasion reaching an AUC of 0.83, an 8% improvement over intratumoral features. These studies collectively reveal that the peritumoral microenvironment at 4-5 mm may influence tumor invasiveness through mechanisms such as angiogenesis and immune infiltration. Its texture features demonstrate diagnostic advantages across multimodal imaging platforms, including ultrasound, photoacoustic, and MRI. Particularly in lesions with indistinct borders, peritumoral heterogeneity may more sensitively reflect tumor biological activity, providing a solid theoretical foundation for constructing peritumoral models.

This study constructed 13 radiomics models based on intratumoral and peritumoral ultrasound images at depths of 1 mm, 2 mm, 3 mm, 4 mm, and 5 mm, utilizing high-throughput radiomics technology. Among these, the peritumoral 4 mm radiomics model constructed using the SVM algorithm exhibited the best predictive performance, with an AUC of 0.904 (95% CI: 0.860–0.948), while the intratumoral radiomics model constructed using the XGBoost algorithm had an AUC of 0.842 (95% CI: 0.788–0.896). The DeLong’test for the two models showed a P-value of 0.773, indicating no statistically significant difference. However, the results from the ROC curves reveal that the hidden biological information carried by the peritumoral 4 mm region is more valuable than that of the intratumoral and other peritumoral regions (AUC_peri_4mm = 0.904 > AUC_inter = 0.842).

The effectiveness of “wavelet-HLL_gldm_LargeDependenceHighGrayLevelEmphasis” in the peritumoral 2 mm model is greater than that of “original_glcm_JointAverage” and “original_firstorder_Minimum,” a trend not observed in the intratumoral model. Several factors contribute to these results: the observed spatial heterogeneity of radiomics features likely arises from intrinsic differences in tumor biology and imaging characteristics. In intratumoral regions, densely packed malignant cells and necrotic zones may result in homogeneous texture patterns, which explains why lower wavelet-HLH_glrlm_RunVariance values (reflecting reduced pixel variation) are associated with malignancy. In contrast, peritumoral regions (1-2 mm) often show infiltrative tumor growth and stromal reactions, increasing tissue complexity. Higher RunVariance values in these regions may capture disrupted architectural patterns caused by tumor invasion, aligning with their association with aggressive phenotypes. Furthermore, features such as wavelet-HLL_gldm_LargeDependenceHighGrayLevelEmphasis, which quantify the clustering of high-intensity pixels, may reflect microvascular proliferation in the peritumoral stroma—a hallmark of tumor angiogenesis. This is supported by previous studies linking peritumoral vascularity to malignancy risk37,38. The reduced role of these features intratumorally may be attributed to necrotic dominance, which suppresses gray-level dependency39. Collectively, these findings support the “seed and soil” hypothesis, wherein the peritumoral microenvironment (2-4 mm) plays a crucial role in determining tumor behavior, and ultrasound radiomics offers a non-invasive method to decode this spatial interplay.

The observed β and GDI patterns challenge the classical monotonic decay hypothesis. We propose that tumor-secreted immunosuppressors lead to homogenization of the proximal zones, while the peri_1-2 mm region functions as a diagnostic 'twilight zone,' where increasing stromal heterogeneity induces abrupt signal shifts, as captured by wavelet features such as wavelet-HLH_glrlm_RunVariance. Beyond the peri_3mm region, systemic homeostasis (e.g., collagen crosslinking, immune surveillance) mitigates variability, rendering extended sampling unnecessary.

Among the combined models constructed using intratumoral and various peritumoral ultrasound imaging features along with clinical and ultrasound independent predictive factors, the combined model with the peritumoral 2 mm region achieved the highest AUC, improving from 0.799 to 0.849. In contrast, the AUC for the combined model with the peritumoral 4 mm region decreased from 0.904 to 0.782, while the AUC for the combined model including intratumoral features improved from 0.842 to 0.858. This may be attributed to the blurred boundary between NML and normal tissue, which hinders the clinical and ultrasound independent predictive factors from capturing the features of the peritumoral region for the combined model. Additionally, the variations in the peritumoral 4 mm region’s imaging features are significantly different from those of the intratumoral features, and both the clinical and ultrasound models primarily represent intratumoral characteristics, resulting in poor complementarity and leading to inferior predictive performance for the combined model with the peritumoral 4 mm region compared to the original imaging model. In contrast, the peritumoral 2 mm region is more closely aligned with the peritumoral microenvironment, enhancing its complementarity with the clinical and ultrasound models. The constructed DCA curves indicate that the best predictive performance is achieved with the combined model that includes the peritumoral 2 mm region (AUDC: 0.294, 95%: -0.530 to -0.549), followed by the combined model with intratumoral features (AUDC: 0.282, 95%: -0.306 to -0.297). The DCA curve reveals that when the threshold probability is very low, the model’s net benefit is negative. This is because the lower end of the DCA curve often overlaps with the net benefit curve of the "Treat-all" strategy, indicating that at very low thresholds, the model cannot provide greater net clinical benefit than the empirical treat-all strategy. Furthermore, the model’s false positive predictions may lead to a large number of benign cases being misdiagnosed as malignant and undergoing unnecessary biopsies or surgeries, where the clinical harm outweighs the benefit. The results of this study demonstrate that the nomograms combining intratumoral and peritumoral 2 mm imaging scores with clinical and ultrasound independent predictive factors exhibit similar predictive performance in distinguishing benign and malignant NWL, with C-index values of 0.982 and 0.978, respectively. This suggests a high diagnostic value that can provide valuable reference for clinical decision-making.

This study has certain limitations: it is a single-center retrospective study, and non-mass breast cancer is relatively rare in clinical practice, resulting in a small sample size. The study utilized internal validation with cross-validation to assess the model’s performance, but the lack of an external independent validation group may introduce a series of biases. The reliability and stability of these results need to be confirmed by increasing the sample size and conducting multicenter studies. Due to the fact that NWL does not meet the BI-RADS criteria for mass characteristics on ultrasound images, there are challenges in the annotation and contouring of two-dimensional ultrasound images. To address this issue, we chose to delineate the most prominent parts of the lesions. However, this approach makes the image segmentation susceptible to subjective differences among different observers. Future studies could consider employing automated segmentation methods to reduce inter-observer variability.