Variability Regularized Feature Selection (VaRFS) for optimal identification of robust and discriminable features from medical imaging

Sadri, Amir Reza; Azarianpour, Sepideh; Chirra, Prathyush; Singh, Sneha; DeSilvio, Thomas; Madabhushi, Anant; Viswanath, Satish E.

doi:10.1038/s44303-025-00136-5

Download PDF

Article
Open access
Published: 28 January 2026

Variability Regularized Feature Selection (VaRFS) for optimal identification of robust and discriminable features from medical imaging

Amir Reza Sadri¹,
Sepideh Azarianpour¹,
Prathyush Chirra¹,
Sneha Singh²,
Thomas DeSilvio¹,
Anant Madabhushi³ &
…
Satish E. Viswanath^1,3,4

npj Imaging volume 4, Article number: 5 (2026) Cite this article

732 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Computerized features derived from medical imaging have shown great potential in building machine learning models for predicting and prognosticating disease outcomes. However, the performance of such models depends on the robustness of extracted features to institutional and acquisition variability inherent in clinical imaging. To address this challenge, we propose Variability Regularized Feature Selection (VaRFS), a framework that integrates feature variability as a regularization term to identify features that are both discriminable between outcome groups and generalizable across imaging differences. VaRFS employs a novel sparse regularization strategy within the within the Least Absolute Shrinkage and Selection Operator (LASSO) framework, for which we analytically confirm convergence guarantees as well as present an accelerated proximal variant for computational efficiency. We evaluated VaRFS across five clinical applications using over 700 multi-institutional imaging datasets, including disease detection, treatment response characterization, and risk stratification. Compared to three conventional feature selection methods, VaRFS yielded consistently higher classifier AUCs in hold-out validation; balancing reproducibility, sparsity, and discriminability in medical imaging feature selection.

Hybrid feature engineering of medical data via variational autoencoders with triplet loss: a COVID-19 prognosis study

Article Open access 17 February 2023

A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

Article Open access 02 September 2023

A new generative adversarial network for medical images super resolution

Article Open access 09 June 2022

Introduction

Advances in computerized analysis of medical images include deriving computational features toward building machine learning models which can accurately predict or prognosticate disease outcomes^1,2. However, variations in image acquisition parameters, scanner types, and institutional practices can significantly affect the appearance of medical images^3,4,5, as a result of which the same tissue region may be represented differently in clinically acquired scans. As a result, even minor differences in scanner hardware, reconstruction algorithms, acquisition protocols, or patient positioning can substantially alter feature values across sites or sessions, independent of underlying biology.^6,7 Similar instability can arise from differences in annotations or identification of regions of interest (ROI), additionally impacting reproducibility of extracted features.^8,9 The resulting fluctuations in computationally extracted medical image features are largely unrelated to the underlying disease conditions, and result in classifier models that do not accurately generalize between medical institutions simply due to slight variations in scanner calibration and acquisition protocols. It has thus become increasingly crucial to determine the variability¹⁰ of computerized features from medical images, both within and between institutions, across different acquisition protocols^11,12, as well as in test-retest settings⁷. The key challenge is thus to identify medical image features that are simultaneously reproducible¹³ to cross-domain variability as well as discriminable in order to ensure the generalizability¹⁴ and clinical utility of associated classifier models, especially when evaluated in the context of new, unseen medical imaging cohorts.

The most popularly used feature selection methods in medical imaging studies include minimum redundancy maximum relevance (mRMR)¹⁵, Wilcoxon rank-sum testing (WLCX)¹⁶, and least absolute shrinkage and selection operator (LASSO)¹⁷.By surveying over 50 recent studies (summarized in Supplementary Table 2), more than 80% were found to utilize one or more of these three specific techniques as illustrated in Fig. 1(a). Notably, these methods primarily focus on identifying a minimal set of discriminable features, potentially overlooking the reproducibility or variability of these features. Figure 1(b) highlights that only about 30% of medical imaging studies have explicitly incorporated variability-based screening (e.g., hard thresholds), while over 70% did not assess feature variability at all.

**Fig. 1: Overview of popularly utilized feature selection practices in medical imaging studies.**

Multiple approaches have been proposed for quantifying feature variability in the context of medical imaging. Statistical measures including intra-class correlation coefficient (ICC)¹⁸, instability score (IS)¹⁹, or the coefficient of variation (CV)²⁰ have been utilized as feature reproducibility measures. Correspondingly, measures such as classifier AUC have been used to quantify feature discriminability^21,22. These measures are used to “screen” features by determining a cut-off threshold at which features are considered discriminable or reproducible²³. However, threshold criteria for both reproducibility and discriminability measures are often empirically determined for a given study and indeed, can often vary significantly between studies. Figure 1(c) illustrates the distribution of threshold values reported across 40 recent studies which have utilized different variability measures (IS, ICC, and CV) and a discriminability measure (AUC). It can be observed that a wide spectrum of threshold values have been utilized, with no clear consensus on the optimal thresholds to define a discriminable and reproducible feature set.

Utilizing empirically determined thresholds to screen medical image features may be further complicated when attempting to optimize for multiple sources of variability. For instance, different variability measures are often used to quantify batch effects (such as IS¹⁹) vs quantifying differences due to annotation sources (such as ICC^24,25). The interplay between multiple sources of variability is likely not appropriately accounted for if each measure is independently used to filter out medical image features. Similarly, if feature discriminability and reproducibility are evaluated independently of each other, features that have only marginal variability but may still be highly discriminable could be filtered out.

To illustrate this relationship between discriminability and variability, Fig. 2 presents 2D scatter plots of feature discriminability (Y-axis) vs two different measures of feature variability (X-axes), where each point corresponds to a computerized image feature. Identifying features that are both highly discriminatory and highly reproducible would involve determining which features meet pre-defined thresholds (TH₁, …, TH₈, horizontal and vertical dash lines) for each measure being considered. This would then yield the optimal feature sets highlighted via the blue boxes on each plot. Notably this set of optimal features can be seen to comprise different feature families in each plot (primarily F₅ in Fig. 2(a) but primarily F₄ in Fig. 2(b)), due to differences in feature trends between the two variability measures. There are also differences in the sub-optimal feature sets identified in each plot (low in discriminability and reproducibility), highlighted via red boxes. This suggests a significant challenge in optimally determining a trade-off between discriminability and reproducibility for computerized features, in terms of not only identifying the appropriate variability measure (which depends on how many imaging modalities, scanners, imaging protocols, or institutions are being considered) as well as determining the best threshold value toward identifying the most discriminatory and reproducible feature set for disease characterization.

Fig. 2: 2D scatterplots illustrating the relationship between variability (in terms of instability score (IS) and coefficient of variation (CV)) and discriminability (via Area Under the ROC Curve (AUC)).

In this work, we present a novel Variability Regularized Feature Selection (VaRFS) approach, which simultaneously attempts to ensure feature discriminability while also optimizing for feature variability across institutions, scanners, or acquisition settings; in the context of medical imaging data. An initial limited implementation of VaRFS was discussed in²⁶, beyond which the current work incorporates multiple sources of variability, analytical evaluation of the convergence properties, as well a comprehensive comparison across larger multi-institutional data cohorts. Our novel optimization framework directly integrates feature reproducibility into the selection process through a variability-based soft penalty term. Unlike traditional methods that apply reproducibility screening as a hard pre-filter (e.g., removing features with variability “score” which does not meet a pre-specified threshold), our approach maintains a unified formulation that jointly accounts for variability and predictive power. This not only allows for finer control over feature variability but also minimizes the chance of prematurely excluding of marginally variable but highly informative features. The specific novel contributions of our current work are as follows:

1.
VaRFS integrates feature variability screening and feature selection into a single optimization function; overcoming the need for empirical selection of an appropriate threshold value per variability measure or cohort. VaRFS is also designed to ensure a better tradeoff between the three essential properties of a computerized medical image feature set: discriminability, sparsity, and reproducibility. By comparison, separating feature variability screening and feature selection could result in sub-optimal features being identified due to the elimination of highly discriminatory features that do not meet empirically determined variability threshold criteria. As VaRFS integrates these two processes into a single optimization function, the feature selection process can be made more efficient, reliable, and flexible.
2.
The objective function of VaRFS aims to maximize the discriminability of the selected features, with additional regularization terms to impose constraints on sparsity and variability to ensure that the selected features are also sparse and reproducible. Towards this, VaRFS leverages the least absolute shrinkage and selection operator (LASSO) framework¹⁷, as it can easily assimilate supplementary regularization conditions^27,28. As this extension may result in slow convergence due to large-valued regularization parameters, as well as careful tuning of the step size^29,30, popular approaches such as coordinate descent³¹ and iterative shrinkage-thresholding algorithm³² may be sub-optimal. To address these limitations, we present a comprehensive analytical framework that leverages a novel class of proximal algorithms³³ which are computationally efficient, easy to implement, and can handle non-smooth objective functions³⁴. We analytically demonstrate how the incorporation of proximal algorithms into our unique extension of the LASSO framework can further be accelerated³⁵ for faster convergence, which represents a significant advancement in the field of optimization.
3.
VaRFS provides a significant contribution to enabling clinical usage of machine learning models, specifically in addressing the challenging issue of optimally accounting for variability and reproducibility of computerized radiology image (or radiomic³⁶) features. Toward this, VaRFS will be comprehensively compared against three routinely utilized feature selection approaches across five multi-institutional radiographic imaging cohorts involving challenging clinical problems including differentiating healthy and diseased samples, characterizing response to treatment, and as well as risk stratification; in both oncological (prostate, rectal cancer) and non-oncological (Crohn’s disease) settings.

Results

Experimental evaluation of VaRFS and alternative feature selection (FS) strategies were conducted using five different, multi-institutional, retrospectively accrued cohorts which were segregated into independent discovery and validation sets (see Table 1). The overall experimental workflow is illustrated in Fig. 3.

Table 1 Multi-institutional data cohorts, splits, and classification tasks considered in this study

Full size table

**Fig. 3: Overall experimental workflow evaluating VaRFS against alternative feature selection strategies.**

Experiment 1: comparing VaRFS against variability-screened feature selection methods

VaRFS was found to result in statistically significantly higher AUC values in all five cohorts for both discovery and hold-out validation, compared to any alternative FS approach. This suggests the integration of feature variability directly into the selection scheme can improve overall model performance, including multi-institutional validation. This is summarized via Table 2 in terms of the classifier performance for top-ranked radiomic features identified via each FS scheme (VaRFS, variability-screened mRMR, LASSO, and WLCX); for all five cohorts. Note these results are based on considering a single measure of variability at a time, e.g., results are presented for each of ICC_dose, IS_batch, and ICC_annot for C₅. Supplementary Table 3 presents averaged performance across all cross-validation runs for each FS method, which further confirms the superior performance trends of VaRFS compared to other FS methods. This can also be noted when utilizing an LDA model to evaluate VaRFS against alternative FS methods, as summarized in Supplementary Table 4.

Table 2 Performance of VaRFS feature set vs variability-screened mRMR, LASSO, and WLCX-based feature sets in terms of AUC for distinguishing the two classes in each of C₁-C₅ using a single variability measure

Full size table

The results presented in Table 3 similarly summarize classifier performance for different FS strategies for all five cohorts, but when considering multiple measures of variability simultaneously. Radiomic features identified via VaRFS yielded statistically significant improvements in AUC values in all five cohorts (both in the discovery and hold-out validation sets) compared to any alternative FS strategy. Notably, accounting for multiple sources of variability via VaRFS can be seen to yield a further improvement in classifier AUC beyond using individual variability measures (compare Table 2 vs Table 3); corresponding to an overall 8–10% improvement for VaRFS over variability-screened FS approaches.

Table 3 Classifier AUC of VaRFS feature set vs alternatives in distinguishing the 2 classes in each cohort when considering multiple variability measures

Full size table

Figure 4 illustrates the chord diagram of the five top-ranked radiomic features selected via VaRFS as well as each of variability-screened mRMR, LASSO, and WLCX, together with their respective ranks, feature family (indicated via colors), and feature importance in terms of SHAP values (indicated via size). Chord connections highlight instances where a feature is common to two or more different methods, based on which the VaRFS feature set can be seen to include a majority of reproducible features (some of which had also been identified by other FS methods). Typically, VaRFS can be seen to have the most features in common with LASSO; which aligns with the commonality in their objective functions. The top-ranked VaRFS features identified here can be seen to correspond to radiomic descriptors from Laws, Gradient, and Haralick feature families. This resonates with previous findings from our group^37,38,39,40 as well as others^41,42 where these patterns have shown associations with specific disease biology or physiological characteristics (additional details in the Supplementary Materials).

**Fig. 4: Chord diagram of the five top-ranked features identified by VaRFS and other alternative FS methods when considering multiple variability measures, for each of C₁-C₅.**

Figure 5 depicts PCPs of discriminability/variability trends in these five top-ranked radiomic features selected via VaRFS within each cohort, each of which corresponds to a polyline that connects vertices between different parallel axes (representing the specific discriminability or variability value of that feature). It can be observed that many of the marginally variable features selected via VaRFS are not only highly discriminatory but also located in close proximity to the threshold (indicated via horizontal dashed lines). This suggests that a slight adjustment to this cutoff value would include or exclude critically useful features from consideration by different FS strategies (due to not meeting ad hoc variability criteria). The overall improved classifier performance achieved by incorporating these marginally variable features suggests they significantly augment the overall discriminability of the VaRFS model while not compromising on its generalizability to unseen data (consistently improved performance across discovery and validation).

**Fig. 5: Parallel Coordinate Plot (PCP) based on five top-ranked features selected via VaRFS.**

Figure 6 depicts an UpSet-style error decomposition for each cohort (C₁-C₅). For each of the 5 FS strategies evaluated by partitioning false positives (FP) and false negatives (FN) into error sets, revealing both shared and method-specific failure modes. Both VaRFS variants can be seen to consistently produce fewer unique errors, with most misclassifications overlapping with those made by other methods. Across all cohorts, comparator FS methods (mRMR, WLCX) also demonstrate a markedly higher error rate compared to both VaRFS approaches, with the the smallest proportion of FPs/FNs associatd with VaRFS when considering multiple sources of variability. This pattern demonstrates that VaRFS does not appear to introduce new or unstable error modes compared other FS schemes; instead, it reduces method-specific errors while preserving discriminability.

Fig. 6: UpSet-style error decomposition for each cohort C₁-C₅, where stacked bars show the distribution of false positives (FP, brown) and false negatives (FN, red) associated with each FS approach (mRMR, WLCX, LASSO, VaRFS-Single, VaRFS-Multi).

Experiment 2: evaluating parameter sensitivity of VaRFS

Optimal classifier performance for VaRFS (highlighted in pink) in all five cohorts is observed when equally weighting β (variability) and λ (sparsity), though stable performance can be noted across a broad range of regularization parameters (see Supplementary Materials for a more detailed description). This can be seen in Fig. 7 via a 3D barplot of AUC values for the VaRFS feature set selected for each parameter combination of β and λ, when the corresponding RF model is evaluated in hold-out validation. The best overall AUC value in three cohorts corresponds to β = λ = 0.5 (and is very close to these values for the remaining two cohorts). Intuitively, classifier performance is seen to decline markedly for extreme parameter combinations (β < < λ or β > > λ) which indicates that both sparsity and variability terms are equally critical in the VaRFS cost function. This allows for more intelligent and reliable identification of feature sets which are simultaneously discriminable, sparse, and reproducible, thus reducing the chance of model overfitting while improving its generalizability.

Fig. 7: 3D bar plot of classifier AUC values (Z-axis) in hold-out validation for each cohort, when considering VaRFS feature sets selected for each combination of regularization parameters for variability (β, X-axis) and sparsity (λ, Y-axis)).

Experiment 3: comparing regular vs accelerated versions of VaRFS

Accelerated VaRFS was found to converge at a faster rate as well as yield a lower minimization of the objective function compared to the regular implementation; in all five cohorts. Figure 8 presents optimization trends for the objective function J(θ) in (3), as computed by each of the regular (red lines) and accelerated versions (blue lines) of VaRFS over 100 iterations. When considering multiple measures of variability, the initialization of J(θ) is intuitively higher as compared to the single variability measure across all five cohorts. The visualization of C₅ with three different sources of variability emphasizes this point, as the initialization of the J(θ) here has the highest value across all cohorts. VaRFS was also found to be more computationally efficient (average runtime of 124 seconds for regular, 79 seconds for accelerated) in comparison to both mRMR (614 secs runtime) and WLCX (412 secs runtime), while performing only marginally worse than LASSO (132 secs runtime) in all five cohorts. These results suggest that the use of proximal algorithms, rather than primal-dual methods⁴³ or projection onto the convex sets⁴⁴, are an appropriate choice for VaRFS as it includes alternating direction method of multipliers⁴⁵ which has been shown an efficient and computationally cheaper approach.

**Fig. 8: Trends in the objective function J(θ) in Equation (3) for regular (red) vs accelerated (blue) versions of VaRFS, computed via Algorithm 1 and Algorithm 2, respectively.**

Building on these advantages, the accelerated proximal algorithm for optimizing the VaRFS objective function allows for more efficient solving of a convex but non-smooth optimization problem via the use of a momentum term that helps it converge faster⁴⁶. These results are also inline with previous studies^47,48,49; demonstrated here for the first time in the context of radiomics and medical image analysis.

Discussion

In this study, we presented a novel radiomic feature selection scheme, Variability Regularized Feature Selection (VaRFS) which represents a first effort at integrating feature variability as a generalizable regularization term directly into the optimization function used to select a sparse and discriminable set of features. Radiomic features selected via VaRFS achieved significantly higher classification performance compared to three routinely utilized feature selection approaches across five multi-institutional radiographic imaging cohorts involving challenging clinical problems including differentiating healthy and diseased samples, characterizing response to treatment, and as well as risk stratification. We were additionally able to demonstrate the computational efficiency of the VaRFS approach as well as examine how exploiting the trade-offs in feature discriminability and variability can ensure improved model performance.

To enhance the robustness and generalizability of machine learning models in medical image analysis, there has been increasing recognition of the need to consider the reproducibility of radiomic features given their sensitivity to acquisition parameters¹¹ and batch effects^9,50. Neglecting feature reproducibility can lead to an increased risk of false positive associations and type I errors⁷. Recent efforts in this regard have largely adopted an independent feature screening approach prior to feature selection^21,23,51. These approaches typically utilize thresholding of variability measures to omit any features which do not meet prespecified criteria. This is because popular feature selection approaches (LASSO, mRMR, and WLCX) have not been explicitly designed to account for variability, but rather only for sparsity and discriminability. Feature screening can be also seen to suffer similar issues to dichotomizing continuous variables⁵², such as loss of information, reduced statistical power, and increased risk of false positives. It is also worth noting that blindly removing unstable features (without regard to their discriminability) or simply retaining all features (without regard to their reproducibility) based on pre-specified thresholds may not result in an optimal, generalizable feature set.

In order to account for these issues, VaRFS simultaneously optimizes for feature contributions in terms of discriminability, sparsity, and reproducibility. Unlike traditional methods which rely on threshold-based variability screening, VaRFS directly optimizes for feature variability together with sparsity and discriminability, offering a more principled alternative to exhaustive threshold parameter tuning. Furthermore, the features selected by VaRFS can be seen to represent an optimal trade-off between different variability measures, while not compromising on its ability to identify a complementary suite of features and patterns in the data. This is borne out in our experimental results, where radiomic features selected via VaRFS yielded significantly higher classification performance compared to features selected after variability screening; suggesting the significant advantages enabled by developing an approach which can simultaneously optimize for sparsity, discriminability, and reproducibility rather than considering each of these factors independently. VaRFS thus offers a more efficient and effective method for feature selection which could ultimately improve the clinical translation and practical utility of radiomics-based models.

We do acknowledge some limitations to our study. While considering five multi-institutional cohorts totaling over 700 patient datasets from 12 different institutions, we primarily considered binary classification problems in specific disease use-cases using MRI or CT scans. These results will require further confirmation in other diseases, when analyzing other radiomic feature families, as well as when considering other imaging modalities (e.g., PET, digital pathology). Our experiments did not incorporate any prior knowledge about feature variability, which was instead empirically determined on the fly within our specific cohorts. This was done primarily to ensure an even playing field when comparing VaRFS with alternative feature selection and screening approaches. Our selection of comparators in the current study was based on their wide usage in the medical imaging and radiomics literature, wherein WLCX, LASSO, and mRMR remain the most widely utilized for feature selection. Additional comparator FS techniques which could have been considered by us include tree-based importance measures⁵³, ensemble-based strategies⁵⁴, or ElasticNet⁵⁵; which will be a subject for future work. Prior studies^23,25 have linked robust radiomic features to clinical and biological endpoints. Understanding the relationship between biological interpretability and the variability characteristics of radiomic features will be a key direction for future research, building on the methodological development of VaRFS undertaken in the current study.

In the future, we plan to extend the VaRFS framework in order to incorporate the concept of reproducibility into deep learning approaches. We will also examine how to incorporate priors in terms of which feature families to utilize, as well as extend VaRFS for use in multi-class and continuous regression problems.

Methods

Overview of VaRFS

All the data are assumed to be real-valued. Vectors and matrices are marked by boldface lower-case letters and upper-case bold, respectively. Additional notation used in this work is summarized in Table 4.

Table 4 Common notation utilized in Section IV

Full size table

Consider S data sources (e.g. institutions, batches, scanners), each of which are associated with the feature matrix ${{\bf{X}}}_{i}=[{{\bf{x}}}_{i}^{1}\ldots {{\bf{x}}}_{i}^{j}\ldots {{\bf{x}}}_{i}^{m}]\in {{\mathbb{R}}}^{{n}_{i}\times m}$, where ${{\bf{x}}}_{i}^{j}$ is the feature vector for the ith data source and the jth feature. m represents the total number of features (assumed to be the same for all S data sources) while n_i corresponds to the number of samples from the ith source, respectively. Let ${{\bf{y}}}_{i}\in {{\mathbb{R}}}^{{n}_{i}\times 1}$ be the corresponding label vector for n_i samples. Accumulated feature values over all samples and all cohorts can be denoted via ${\bf{X}}={[{{\bf{X}}}_{1}\ldots {{\bf{X}}}_{S}]}^{T}\in {{\mathbb{R}}}^{n\times m}$ where $n=\mathop{\sum }\nolimits_{i = 1}^{S}{n}_{i}$.

Problem statement for feature selection via LASSO

Finding a user-specified number of discriminative features (denoted via the level of sparsity, c) can be cast as a constrained optimization problem as follows⁵⁶:

$$\mathop{\min }\limits_{{\boldsymbol{\theta }}\in {{\mathbb{R}}}^{{\boldsymbol{m}}}}\frac{1}{2}\mathop{\sum }\limits_{i=1}^{s}{\left\Vert {{\bf{y}}}_{i}-{{\bf{X}}}_{i}{\boldsymbol{\theta }}\right\Vert }_{2},\quad \,\text{subject}\,\,{to}\,\quad {\left\Vert {\boldsymbol{\theta }}\right\Vert }_{0}\le c,$$

(1)

where ${\boldsymbol{\theta }}\in {{\mathbb{R}}}^{m}$ is the coefficient vector reflecting the contribution of each feature. This non-smooth combinatorial optimization problem is NP-hard⁵⁷. A common alternative for (1) is to consider the convex relaxation based on the ℓ₁ norm, which corresponds to the LASSO equation¹⁷, written as:

$$\mathop{\min }\limits_{{\boldsymbol{\theta }}\in {{\mathbb{R}}}^{m}}\left\{\frac{1}{2}\mathop{\sum }\limits_{i=1}^{s}{\left\Vert {{\bf{y}}}_{i}-{{\bf{X}}}_{i}{\boldsymbol{\theta }}\right\Vert }_{2}^{2}+\lambda {\left\Vert {\boldsymbol{\theta }}\right\Vert }_{1}\right\},$$

(2)

where λ is the regularization parameter.

Development of VaRFS

Radiomic features are known to vary between data sources due to intra-site, inter-site, or test/retest differences including changes in the device, modality, sequence, compartment, patient, or laboratory settings³⁶. This variability is typically quantified via different statistical measures (e.g. IS, ICC, CV).

Based on the types of variability being considered, denoted via v ∈ {1, …, V} (e.g., batch effects, annotation differences), we define the feature variability vector as ${{\bf{u}}}_{v}={\left[{u}_{v}^{1}\ldots {u}_{v}^{j}\ldots {u}_{v}^{m}\right]}^{T}$, based on computing a measure of variability on a per-feature basis via statistical comparisons of bootstrapped subsets generated from the original feature space. In matrix form, this is represented via the feature variability matrix, ${\bf{P}}=[{{\bf{u}}}_{1}\ldots {{\bf{u}}}_{V}]\in {{\mathbb{R}}}^{m\times V}$.

We incorporate feature variability into the LASSO formulation, by adding an additional penalty term to the objective function J(θ) in (2). Note that this penalty term is in the quadratic form to ensure that it is convex and a key element of the optimization approach described below.

This represents the objective function for VaRFS, written as:

$$\mathop{\text{argmin}}\limits_{{{\boldsymbol{\theta}}\in {\mathbb{R}}^m}} \left\{ J({{\boldsymbol{\theta}}}) = \mathop{\overbrace{\mathop{\underbrace{\frac{1}{2}\mathop{\sum}\limits_{i=1}^{S}{\|{{{\bf{y}}}_i - {{\bf{X}}}_i{{\boldsymbol{\theta}}}}\|}_2^2}}\limits_{{\rm{discriminability}}} + \mathop{\underbrace{\beta {{\boldsymbol{\theta}}}^T{{\bf{R}}}{{\boldsymbol{\theta}}}}}\limits_{{\rm{variability}}}}}\limits^{f({{\boldsymbol{\theta}}})} + \mathop{\overbrace{\mathop{\underbrace{\lambda{\|{{{\boldsymbol{\theta}}}}\|}_1}}\limits_{{\rm{sparsity}}}}}\limits^{g({{\boldsymbol{\theta}}})}\right\}$$

(3)

where β is the regularization parameter, used to differentially weight variability measures (and thus, different sources of variability). Note that R is the symmetric form of the feature variability matrix ${\bf{R}}\triangleq {\bf{P}}{{\bf{P}}}^{T}=\mathop{\sum }\nolimits_{v = 1}^{V}{{\bf{u}}}_{v}{{\bf{u}}}_{v}^{T}$.

Optimization of VaRFS

While f + g is a convex objective function in (3), it is still non-smooth (due to the sparsity term g(θ)) and thus cannot be solved by regular optimization methods such as gradient descent. Rather than computationally expensive and complex alternatives such as the alternating direction method of multipliers⁵⁸, we utilize proximal algorithms⁵⁹ as they work under extremely general conditions, are much faster for challenging optimization problems, as well as being scalable and amenable to distributed optimization⁴⁶. Based on Lemma 1 (see Supplementary Information Section A), in order to minimize f + g in our convex optimization problem, we can replace the non-smooth function f with its upper-bound (denoted $\bar{f}$) which results in the following iterative solution algorithm for (3),

$${{\boldsymbol{\theta }}}_{k+1}=\mathop{\,\text{argmin}\,}\limits_{{\boldsymbol{\theta }}\in {{\mathbb{R}}}^{m}}\left\{\bar{f}({\boldsymbol{\theta }},{{\boldsymbol{\theta }}}_{k})+g({\boldsymbol{\theta }})\right\},$$

(4)

where, for the kth iteration,

$$\bar{f}({\boldsymbol{\theta }},{{\boldsymbol{\theta }}}_{k})=f({{\boldsymbol{\theta }}}_{k})+{\nabla }^{T}f({{\boldsymbol{\theta }}}_{k})({\boldsymbol{\theta }}-{{\boldsymbol{\theta }}}_{k})+\frac{1}{2\gamma }{\left\Vert {\boldsymbol{\theta }}-{{\boldsymbol{\theta }}}_{k}\right\Vert }_{2}^{2}.$$

(5)

This in turn is equivalent to

$${{\boldsymbol{\theta }}}_{k+1}=\mathop{\,\text{argmin}\,}\limits_{{\boldsymbol{\theta }}\in {{\mathbb{R}}}^{m}}\left\{\frac{1}{2}{\left\Vert {\boldsymbol{\theta }}-{\bar{{\boldsymbol{\theta }}}}_{k}\right\Vert }_{2}^{2}+\gamma g({\boldsymbol{\theta }})\right\}={{\rm{prox}}}_{\gamma g}({\bar{{\boldsymbol{\theta }}}}_{k}),$$

(6)

where, ${\bar{{\boldsymbol{\theta }}}}_{k}={{\boldsymbol{\theta }}}_{k}-\gamma \nabla f({{\boldsymbol{\theta }}}_{k})$ and prox_γg is the proximal operator of the convex function γg (See Definition 3 in Section A of the Supplementary Information). This base mapping of the proximal algorithm is a standard tool for solving non-smooth optimization problems⁶⁰. Proof that f in (3) is a Lipschitz continuous gradient function is presented as Lemma 2 in Section B of the Supplementary Information, based on which our problem can be seen to meet the requirements for using general proximal algorithms⁵⁹. The final Algorithm 1 summarizes the overall approach to solve (6) within VaRFS.

Algorithm 1

Proximal Algorithm for VaRFS

Input: y, X, P, β, λ, K (number of inner-loop iterations), γ (step-size)

initialization : ${{\boldsymbol{\theta }}}_{0}\in {{\mathbb{R}}}^{m}$, R = PP^T

1: for k = 1, 2, ⋯ , K do

2: $f({{\boldsymbol{\theta }}}_{k})=\frac{1}{2}{\left\Vert {\bf{y}}-{\bf{X}}{{\boldsymbol{\theta }}}_{k}\right\Vert }_{2}^{2}+\beta {{\boldsymbol{\theta }}}_{k}^{T}{\bf{R}}{{\boldsymbol{\theta }}}_{k}$

3: $g({{\boldsymbol{\theta }}}_{k})=\lambda {\left\Vert {{\boldsymbol{\theta }}}_{k}\right\Vert }_{1}$

4: ${\bar{{\boldsymbol{\theta }}}}_{k}={{\boldsymbol{\theta }}}_{k}-\gamma \nabla f({{\boldsymbol{\theta }}}_{k})$

5: ${{\boldsymbol{\theta }}}_{k+1}={{\rm{prox}}}_{\gamma g}({\bar{{\boldsymbol{\theta }}}}_{k})$

6: end for

Output: θ = θ_k+1

Remark 1

Since in (5), $\bar{f}({\boldsymbol{\theta }},{{\boldsymbol{\theta }}}_{k})\ge f({\boldsymbol{\theta }})$, $\bar{f}({{\boldsymbol{\theta }}}_{k},{{\boldsymbol{\theta }}}_{k})=f({{\boldsymbol{\theta }}}_{k})$, $\bar{f}({\boldsymbol{\theta }},{{\boldsymbol{\theta }}}_{k})$ is so-called majorization function of f(θ)⁶¹. Therefore, our algorithm is a type of majorization-minimization algorithm⁶².

Convergence analysis of VaRFS

We examine the restrictions on the learning rate parameter γ to assure convergence of the iterations as outlined in (6), in Theorem 1.

Theorem 1

The sequence {θ_k} in (6), converges to a stationary point of f + g. To guarantee convergence, parameter γ must adhere to

$$0 < \gamma \le \frac{1}{{\left\Vert {\bf{Q}}\right\Vert }_{2}},$$

(7)

in which,

$${\bf{Q}}\triangleq {{\bf{D}}}^{T}{\bf{D}},\,\,\,\,\,{\bf{D}}\triangleq {\left[{{\bf{X}}}^{T}\sqrt{2\beta }P\right]}^{T}.$$

(8)

The proof may be found in Supplementary Information Section B.

Theorem 2

Let Q in (8) is a positive-definite (PD) matrix with singular values sorted as ${\sigma }_{\min }\le \ldots \le {\sigma }_{\max }$. Given that the Algorithm 1 reaches the optimal solution θ^* with a generic learning rate γ, the iterations of this algorithm demonstrate a linear convergence rate. Moreover, we have

$${\left\Vert {{\boldsymbol{\theta }}}_{k+1}-{{\boldsymbol{\theta }}}^{* }\right\Vert }_{2}\le z(\gamma ){\left\Vert {{\boldsymbol{\theta }}}_{k}-{{\boldsymbol{\theta }}}^{* }\right\Vert }_{2},$$

(9)

where $z(\gamma )=\max \left\{\left\vert 1-\gamma {\sigma }_{\min }\right\vert ,\left\vert 1-\gamma {\sigma }_{\max }\right\vert \right\}$ is the convergence rate.

The proof is provided in Section B of the Supplementary Information.

Figure 9 depicts the convergence rate of the VaRFS regular proximal algorithm for some different learning rate γ. The convergence rate is illustrated based on the condition number ($\kappa \triangleq \frac{{\sigma }_{\max }}{{\sigma }_{\min }}$) of the matrix Q. According to (7), as we expected the well-conditioned matrices with κ(Q) > > 1 are faster to converge rather than the ill-conditioned ones with κ(Q) ≈ 1. As can be seen, the higher step size in the convergence interval ($0 < \gamma \le \frac{1}{{\left\Vert {\bf{Q}}\right\Vert }_{2}}=\frac{1}{{\sigma }_{\max }}$) correspond to the faster rate.

**Fig. 9: Convergence ratio of the VaRFS proximal algorithm for different step sizes of γ.**

Acceleration of VaRFS

Following^30,63, the basic proximal gradient algorithm can be further accelerated through the use of weighted combinations of current and previous gradient directions via an extrapolation step; thus ensuring each iteration does not require more than one gradient evaluation. This is implemented by incorporating a new sequence, ${\{{{\boldsymbol{\eta }}}_{k}\}}_{k = 0}^{\infty }$, which is initialized as η₀ = θ₀. Recursively updating {η_k} and thus {θ_k} at each iteration k ∈ {0, 1, …, K} allows for a significantly faster convergence. Algorithm 2 summarizes this accelerated approach to solving (6) within VaRFS.

Algorithm 2

Accelerated Proximal Algorithm for VaRFS

Input: y, X, P, β, λ, K (number of inner-loop iterations), γ (step-size)

initialization : ${{\boldsymbol{\theta }}}_{0}\in {{\mathbb{R}}}^{m}$, η₀ = θ₀, R = PP^T

1: for k = 1, 2, ⋯ , K do

2: $f({{\boldsymbol{\eta }}}_{k})=\frac{1}{2}{\left\Vert {\bf{y}}-{\bf{X}}{{\boldsymbol{\eta }}}_{k}\right\Vert }_{2}^{2}+\beta {{\boldsymbol{\eta }}}_{k}^{T}{\bf{R}}{{\boldsymbol{\eta }}}_{k}$

3: $g({{\boldsymbol{\eta }}}_{k})=\lambda {\left\Vert {{\boldsymbol{\eta }}}_{k}\right\Vert }_{1}$

4: η_k = η_k − γ ∇ f(η_k)

5: θ_k+1 = prox_γg(η_k)

6: ${{\boldsymbol{\eta }}}_{k+1}={{\boldsymbol{\theta }}}_{k+1}+w\left({{\boldsymbol{\theta }}}_{k+1}-{{\boldsymbol{\theta }}}_{k}\right)$

7: end for

Output: θ = θ_k+1

Remark 2

Parameter w must be chosen in specific ways to achieve convergence acceleration. One simple choice takes $w=\frac{k}{k+3}$⁴⁶.

The Section C of the Supplementary Information provides a detailed computational complexity analysis of both Algorithm 1 and Algorithm 2, highlighting the convergence rates of ${\mathcal{O}}(1/k)$ and ${\mathcal{O}}(1/{k}^{2})$ for the regular and accelerated versions of VaRFS, respectively.

Data description

C₁ (Prostate Cancer MRI) comprised 147 diagnostic T2-weighted (T2w) prostate MRIs from 4 institutions, with the goal of distinguishing benign from malignant lesions in the peripheral zone (discovery: 3 sites, validation: 1 site). More details of this dataset are available in^64,65.

C₂ (Rectal Cancer MRI, pre-CRT) comprised 197 pre-treatment T2w rectal MRIs from 3 institutions, from patients who later underwent standard-of-care chemoradiation (nCRT) and surgery. Histopathologic tumor regression grade (TRG) assessment of the excised surgical specimen was used to define pathologic complete response (pCR) to nCRT. The goal was to distinguish patients who will achieve pCR (i.e., ypTRG0 or 0% viable tumor cells remaining) from those who will not, based on annotated tumor regions on pre-nCRT MRI. For more dataset details refer to³⁸.

C₃ (Rectal Cancer MRI, post-CRT) comprised 119 T2w post-treatment rectal MRI scans from 3 institutions, from patients after they had undergone standard-of-care nCRT but prior to undergoing surgery. Histopathologic tumor stage (ypT) assessment of the excised surgical specimen was used to define the pathologic response to nCRT. The goal was to distinguish patients who achieved tumor regression (i.e., ypT0-2 or tumor that has regressed to within the rectal wall) from those who did not, based on annotated rectal wall regions on post-nCRT MRI. Additional details are in³⁷.

C₄ (Crohn’s Disease MRE) comprised 73 T2w bowel MR enterography (MRE) scans from patients who had been endoscopically confirmed with Crohn’s disease. The goal was to distinguish high-risk patients who needed surgery within one year of MRI and initiation of aggressive immunosuppressive therapy, from low-risk patients (stable for up to 5 years in follow-up); using annotated terminal ileum regions on baseline MRIs. This single institutional cohort was harboring large batch effects as a result of adjustments to acquisition parameters including scanner type and magnetic resonance strength. More details of this dataset are available in⁶⁶.

C₅ (Crohn’s Disease CTE) comprised 165 CT enterography (CTE) scans from patients being screened for Crohn’s disease with endoscopic confirmation of disease presence. The goal was to distinguish between healthy and diseased terminal ileum regions within this single institutional cohort harboring significant batch effects⁶⁷, as well as dose/reconstruction changes.

Radiomic feature extraction

As summarized in Fig. 3, after data acquisition, pre-processing included linear resampling of all scans to an isotropic resolution of 1 × 1 × 1 mm to ensure consistent resolution within each cohort. Additionally, N4ITK bias field correction⁶⁸ in 3D Slicer was used to correct inhomogeneity artifacts in MRI scans in C_1-4. 405 3D radiomic features were then extracted on a voxel-wise basis from all scans. A complete list of all extracted features is provided in Supplementary Data1 in the Supplementary Materials. These features included 20 Histogram, 152 Laws⁶⁹, 13 Gradient⁷⁰, 160 Gabor⁷¹, and 60 Haralick⁷² responses. The mean value of each feature was then computed within specified regions-of-interest (ROIs), and feature normalization was applied on a cohort basis to ensure each feature had a mean of 0 and a standard deviation of 1. Based on the sources of variability present (summarized in Table 5), corresponding variability measures were computed on a per-feature basis for each of the five cohorts C_1-5.

Table 5 Sources of variability and corresponding variability measures for each cohort considered in this work

Full size table

VaRFS implementation and sensitivity analysis

VaRFS was implemented as Algorithm 1 (regular) and Algorithm 2 (accelerated), with K = 100 (number of iterations) and $\gamma =\frac{1}{2{\sigma }_{\max }({\bf{Q}})}$ (mid-point of convergence interval, see Fig. 9). Analysis of convergence differences for J(θ) in (3) between both algorithms was conducted for all five cohorts. To evaluate the effect of the regularization parameters in VaRFS, these were varied as β, λ ∈ {0, 0.1, …, 1} corresponding to variability and sparsity, respectively. These 100 possible β − λ parameter combinations were evaluated for each of C_1-5, resulting in a total of 500 possible parameter variations of VaRFS being evaluated. Since each cohort had at least two sources of variability considered, VaRFS was evaluated for considering each individual measure as well as for the combination of multiple variability measures (e.g. P = [u₁u₂u₃] for C₅).

Comparative evaluation of common FS approaches

As an alternative strategy, conventional feature selection (FS) approaches including maximum relevance minimum redundancy (mRMR)¹⁵, Wilcoxon rank-sum testing (WLCX)¹⁶, and least absolute shrinkage and selection operator (LASSO)¹⁷ were implemented. All three FS methods were utilized in conjunction with feature variability screening, where radiomic features that did not meet a pre-defined threshold for their feature variability measure were not utilized in downstream analysis. Threshold values for different variability measures were selected based on the literature; specifically radiomic features with IS > 0.25²⁵, CV > 0.5⁶⁵, or ICC < 0.85⁷³ were excluded prior to FS. When considering multiple sources of variability, a sequential elimination process was employed where only those radiomic features were retained that met all relevant thresholds for corresponding variability measures.

Experimental evaluation

All five cohorts were partitioned into discovery and validation sets, as summarized in Table 1. The evaluation of feature sets, selected via each of VaRFS, mRMR, LASSO, and WLCX, was carried out by building a Random Forests classifier (RF) for the binary classification tasks in each cohort. The RF classifier was chosen due to its well-documented proficiency in handling high-dimensional, potentially correlated features, and its robustness against overfitting⁷⁴. Moreover, RF can estimate the importance of features, which provides additional insight into the data⁷⁵. In this study, the RF classifier was configured with 50 trees, a maximum depth of 50, and 100 leaf samples.

In all experiments, the RF classifier was first trained and optimized on the discovery cohort using 100 runs of nested 10-fold cross-validation. All model selection and thresholding steps were confined to the training set, within which the average classifier performance was estimated. Based on their formulation, distinct methodologies were employed to determine the top-ranked features and construct a final optimized RF model for hold-out validation when considering statistical FS (mRMR and WLCX) vs optimization-based FS (LASSO and VaRFS). For mRMR and WLCX, the most frequently selected features were identified based on their average rank value across all cross-validation runs. This top-ranked feature set was then utilized to construct a single RF classifier that was evaluated in a holdout fashion on the validation cohort. For LASSO and VaRFS, the best-performing RF model (and a corresponding set of selected features) was identified across all cross-validation runs. This model was then directly evaluated in a holdout fashion on the validation cohort. While these strategies aligned with the operational design of each FS method, an averaging-based approach across all cross-validation runs for each method was additionally implemented to confirm performance trends. Additionally, experimental evaluation was repeated using a Linear Discriminant Analysis (LDA) classifier⁷⁶ for evaluating performance differences between VaRFS and comparator methods. All experiments were conducted in MATLAB 9.9 on a 64-bit Windows 10 PC with an Intel(R) Core(TM) i7 CPU 930 (3.60 GHz) and 32 GB RAM.

In all cases, the area under the receiver operator characteristic curve (AUC) was used as a measure of classifier performance. Statistical comparisons were conducted to assess differences in AUC values between VaRFS and baseline methods. For the training set, a two-tailed Wilcoxon signed-rank test (significance level p < 0.005) was employed using repeated cross-validation results, consistent with prior studies. For the validation set, the DeLong test⁷⁷ was applied to evaluate statistical differences in ROC curves via VaRFS and baseline methods (since no cross-validation was involved).

A color-coded chord diagram was generated to visualize relationships and connections between top-selected features identified via different FS schemes. Additionally, feature importance was computed via their Shapley (SHAP) values⁷⁸ rather than feature rank (in mRMR or WLCX) or feature coefficient (in LASSO or VaRFS). The Shapley value is the average marginal contribution of a feature over all possible coalitions⁷⁹, providing a natural way to compute how much each feature contributes to predictive performance. A parallel coordinate plot (PCP) was constructed⁸⁰ to analyze trends of the top-ranked VaRFS features in terms of multiple variability measures as well as their discriminability. Finally, a model-level error analysis was conducted to quantify trends in false-positive and false-negative instances across mRMR, WLCX, LASSO, and the two VaRFS variants (single and multiple variability measures), to identify which errors were unique and which were in common between different approaches. This analysis was used to generate an UpSet-style visualization⁸¹, enabling direct comparison of error rates as well as distinctiveness in erroneous samples between approaches.

Ethics approval and informed consent

All datasets used in this study comprised de-identified imaging data with appropriate institutional approvals. For the C₁ cohort, data and expert annotations of tumor extent were provided under the Institutional Review Board (IRB) protocol #02-13-42C, approved by the University Hospitals of Cleveland IRB. For the C₂ and C₃ cohort, this HIPAA-compliant, retrospective study was approved by IRBs at three institutions: University Hospitals Cleveland Medical Center (UHCMC, #07-16-40), Cleveland Clinic Foundation (CCF, #18-427), and Case Western Reserve University (STUDY20240128). A waiver for the requirement of informed consent was granted, as only de-identified patient data were utilized. For the C₄ and C₅ cohort, approval was obtained from the University Hospitals Cleveland Medical Center IRB under protocol #11-15-24.

Data availability

The imaging datasets analyzed during the current study are available from the corresponding author upon reasonable request.

Code availability

The source code for VaRFS, along with documentation and sample data, is openly available at: https://github.com/viswanath-lab/VaRFS.

References

Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
Article PubMed PubMed Central Google Scholar
Bera, K., Braman, N., Gupta, A., Velcheti, V. & Madabhushi, A. Predicting cancer outcomes with radiomics and artificial intelligence in radiology. Nat. Rev. Clin. Oncol. 19, 132–146 (2022).
Article CAS PubMed Google Scholar
Bankhead, P. Developing image analysis methods for digital pathology. J. Pathol. 257, 391–402 (2022).
Article PubMed PubMed Central Google Scholar
Van Timmeren, J. E., Cester, D., Tanadini-Lang, S., Alkadhi, H. & Baessler, B. Radiomics in medical imaging—"how-to” guide and critical reflection. Insights Imaging 11, 91 (2020).
Article PubMed PubMed Central Google Scholar
Zhao, B. Understanding sources of variation to improve the reproducibility of radiomics. Front. Oncol. 11, 633176 (2021).
Zwanenburg, A. et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 295, 328–338 (2020).
Article PubMed Google Scholar
Traverso, A., Wee, L., Dekker, A. & Gillies, R. Repeatability and reproducibility of radiomic features: a systematic review. Int. J. Radiat. Oncol.* Biol.* Phys. 102, 1143–1158 (2018).
Article Google Scholar
Mackin, D. et al. Harmonizing the pixel size in retrospective computed tomography radiomics studies. PloS ONE 12, e0178524 (2017).
Article PubMed PubMed Central Google Scholar
Berenguer, R. et al. Radiomics of CT features may be nonreproducible and redundant: influence of CT acquisition parameters. Radiology 288, 407–415 (2018).
Article PubMed Google Scholar
Hagiwara, A., Fujita, S., Ohno, Y. & Aoki, S. Variability and standardization of quantitative imaging: monoparametric to multiparametric quantification, radiomics, and artificial intelligence. Investig. Radiol. 55, 601–616 (2020).
Article Google Scholar
Eck, B. et al. Prospective evaluation of repeatability and robustness of radiomic descriptors in healthy brain tissue regions in vivo across systematic variations in T2-weighted magnetic resonance imaging acquisition parameters. J. Magn. Reson. Imaging 54, 1009–1021 (2021).
Article PubMed PubMed Central Google Scholar
Steiner, D. F., Chen, P.-H. C. & Mermel, C. H. Closing the translation gap: AI applications in digital pathology. Biochim. et. Biophys. Acta (BBA)-Rev. Cancer 1875, 188452 (2021).
Article CAS Google Scholar
Demircioğlu, A. Reproducibility and interpretability in radiomics: a critical assessment. Diagn. Interv. Radiol. 31, 321 (2025).
PubMed PubMed Central Google Scholar
Keenan, K. E. et al. Challenges in ensuring the generalizability of image quantitation methods for MRI. Med. Phys. 49, 2820–2835 (2022).
Article PubMed Google Scholar
Peng, H., Long, F. & Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005).
Article PubMed Google Scholar
Wilcoxon, F., Katti, S. & Wilcox, R. A.Critical Values and Probability Levels for the Wilcoxon Rank Sum Test and the Wilcoxon Signed Rank Test, Vol. 1 (American Cyanamid Pearl River (NY), 1963).
Tibshirani, R. Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc. Ser. B Stat. Methodol. 73, 273–282 (2011).
Article Google Scholar
Jha, A. et al. Repeatability and reproducibility study of radiomic features on a phantom and human cohort. Sci. Rep. 11, 1–12 (2021).
Article CAS Google Scholar
Verma, R. et al. Stable and discriminatory radiomic features from the tumor and its habitat associated with progression-free survival in glioblastoma: a multi-institutional study. Am. J. Neuroradiol. 43, 1115–1123 (2022).
Article CAS PubMed PubMed Central Google Scholar
Edalat-Javid, M. et al. Cardiac SPECT radiomic features repeatability and reproducibility: a multi-scanner phantom study. J. Nuclear Cardiol. 27, 1–15 (2020).
Google Scholar
Shi, L. et al. Radiomics for response and outcome assessment for non-small cell lung cancer. Technol. Cancer Res. Treat. 17, 1533033818782788 (2018).
Article PubMed PubMed Central Google Scholar
Huang, Y.-q. et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J. Clin. Oncol. 34, 2157–2164 (2016).
Article PubMed Google Scholar
Khorrami, M. et al. Stable and discriminating radiomic predictor of recurrence in early stage non-small cell lung cancer: multi-site study. Lung Cancer 142, 90–97 (2020).
Article PubMed Google Scholar
Pati, S. et al. Reproducibility analysis of multi-institutional paired expert annotations and radiomic features of the Ivy Glioblastoma Atlas Project (Ivy GAP) dataset. Med. Phys. 47, 6039–6052 (2020).
Article PubMed Google Scholar
Leo, P. et al. Stable and discriminating features are predictive of cancer presence and Gleason grade in radical prostatectomy specimens: a multi-site study. Sci. Rep. 8, 1–13 (2018).
Article CAS Google Scholar
Sadri, A. R. et al. Sparta: sn integrated stability, discriminability, and sparsity based radiomic feature selection approach. in Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention, 445–455 (Springer, 2021).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005).
Article Google Scholar
Simon, N., Friedman, J., Hastie, T. & Tibshirani, R. A sparse-group lasso. J. Comput. Graph. Stat. 22, 231–245 (2013).
Article Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1 (2010).
Article PubMed PubMed Central Google Scholar
Beck, A. & Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009).
Article Google Scholar
Friedman, J., Hastie, T., Höfling, H. & Tibshirani, R. Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007).
Article Google Scholar
Daubechies, I., Defrise, M. & De Mol, C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. A J. Issued Courant Inst. Math. Sci. 57, 1413–1457 (2004).
Article Google Scholar
Combettes, P. L. & Pesquet, J.-C. Proximal splitting methods in signal processing. in Fixed-point Algorithms for Inverse Problems in Science and Engineering, 185–212 (SpringerFixed, 2011).
Yao, Y., Deng, B., Xu, W. & Zhang, J. Fast and robust non-rigid registration using accelerated majorization-minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–18 (IEEE, 2023).
Chun, I. Y., Huang, Z., Lim, H. & Fessler, J. A. Momentum-net: Fast and convergent iterative neural network for inverse problems. IEEE Trans. Pattern Anal. Mach. Intell. 45, 4915–4931 (2023).
Article PubMed Google Scholar
Gillies, R. J., Kinahan, P. E. & Hricak, H. Radiomics: images are more than pictures, they are data. Radiology 278, 563–577 (2016).
Article PubMed Google Scholar
Alvarez-Jimenez, C. et al. Radiomic texture and shape descriptors of the rectal environment on post-chemoradiation T2-weighted MRI are associated with pathologic tumor stage regression in rectal cancers: a retrospective, multi-institution study. Cancers 12, 2027 (2020).
Article PubMed PubMed Central Google Scholar
Antunes, J. T. et al. Radiomic features of primary rectal cancers on baseline T2-weighted MRI are associated with pathologic complete response to neoadjuvant chemoradiation: a multisite study. J. Magn. Reson. Imaging 52, 1531–1541 (2020).
Article PubMed PubMed Central Google Scholar
Alvarez-Jimenez, C. et al. A novel structural modeling magnitude and orientation radiomic descriptor for evaluating response to neoadjuvant therapy in rectal cancers via MRI. npj Precis. Oncol. 9, 215 (2025).
Article PubMed PubMed Central Google Scholar
Chirra, P. et al. Radiomics to detect inflammation and fibrosis on magnetic resonance enterography in stricturing Crohn’s disease. J. Crohn’s. Colitis 18, 1660–1671 (2024).
Article Google Scholar
Khorrami, M. et al. Distinguishing granulomas from adenocarcinomas by integrating stable and discriminating radiomic features on non-contrast computed tomography scans. Eur. J. Cancer 148, 146–158 (2021).
Article CAS PubMed PubMed Central Google Scholar
Midya, A. et al. Population-specific radiomics from biparametric magnetic resonance imaging improves prostate cancer risk stratification in African American men. JU Open 3, e00068 (2025).
Google Scholar
Komodakis, N. & Pesquet, J.-C. Playing with duality: an overview of recent primal? Dual approaches for solving large-scale optimization problems. IEEE Signal. Process. Mag. 32, 31–54 (2015).
Article Google Scholar
Samsonov, A. A., Kholmovski, E. G., Parker, D. L. & Johnson, C. R. Pocsense: Pocs-based reconstruction for sensitivity encoded magnetic resonance imaging. Magn. Reson. Med. Off. J. Int. Soc. Magn. Reson. Med. 52, 1397–1406 (2004).
Article Google Scholar
Boyd, S. et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3, 1–122 (2011).
Google Scholar
Parikh, N. & Boyd, S. Proximal algorithms. Found. Trends Optim. 1, 127–239 (2014).
Article Google Scholar
Atkins, S., Einarsson, G., Clemmensen, L. & Ames, B. Proximal methods for sparse optimal scoring and discriminant analysis. Adv. Data Anal. Classification 16, 1–54 (2022).
Google Scholar
Sghaier, M., Chouzenoux, E., Pesquet, J.-C. & Muller, S. A novel task-based reconstruction approach for digital breast tomosynthesis. Med. Image Anal. 77, 102341 (2022).
Article PubMed Google Scholar
Chung, H. & Ye, J. C. Score-based diffusion models for accelerated MRI. Med. Image Anal. 80, 102479 (2022).
Article PubMed Google Scholar
Da-Ano, R., Visvikis, D. & Hatt, M. Harmonization strategies for multicenter radiomics investigations. Phys. Med. Biol. 65, 24TR02 (2020).
Article CAS PubMed Google Scholar
Lee, J. et al. Radiomics feature robustness as measured using an MRI phantom. Sci. Rep. 11, 3973 (2021).
Article CAS PubMed PubMed Central Google Scholar
Altman, D. G. & Royston, P. The cost of dichotomising continuous variables. Bmj 332, 1080 (2006).
Article PubMed PubMed Central Google Scholar
Decoux, A. et al. Comparative performances of machine learning algorithms in radiomics and impacting factors. Sci. Rep. 13, 14069 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lee, S. et al. Ensemble learning-based radiomics with multi-sequence magnetic resonance imaging for benign and malignant soft tissue tumor differentiation. PLoS One 18, e0286417 (2023).
Article CAS PubMed PubMed Central Google Scholar
Özdemir, E. Y. & Özyurt, F. Elasticnet-based vision transformers for early detection of parkinson’s disease. Biomed. Signal Process. Control 101, 107198 (2025).
Article Google Scholar
Elad, M.Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing (Springer, 2010).
Sadeghi, M. & Babaie-Zadeh, M. Iterative sparsification-projection: Fast and robust sparse signal approximation. IEEE Trans. Signal Process. 64, 5536–5548 (2016).
Article Google Scholar
Hong, M. & Luo, Z.-Q. On the linear convergence of the alternating direction method of multipliers. Math. Program. 162, 165–199 (2017).
Article Google Scholar
Bolte, J., Sabach, S. & Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014).
Article Google Scholar
Bejar, B., Dokmanic, I. & Vidal, R. The fastest L1, oo prox in the west. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1 (IEEE, 2021).
Sun, Y., Babu, P. & Palomar, D. P. Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Signal. Process. 65, 794–816 (2016).
Article Google Scholar
Sohrab, H. H.Basic Real Analysis (Birkhauser, 2014).
Hastie, T., Tibshirani, R. & Wainwright, M.Statistical Learning with Sparsity: the Lasso and Generalizations (CHCRC, 2019).
Chirra, P. et al. Empirical evaluation of cross-site reproducibility in radiomic features for characterizing prostate MRI. in Proc. Medical Imaging 2018: Computer-Aided Diagnosis, Vol. 10575, 105750B (ISOP, 2018).
Chirra, P. et al. Multisite evaluation of radiomic feature reproducibility and discriminability for identifying peripheral zone prostate tumors on MRI. J. Med. Imaging 6, 024502 (2019).
Article Google Scholar
Chirra, P. et al. Integrating radiomics with clinicoradiological scoring can predict high-risk patients who need surgery in Crohn’s disease: a pilot study. Inflamm. Bowel Dis. 29, 349–358 (2022).
Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).
Article CAS PubMed Google Scholar
Tustison, N. J. et al. N4itk: improved n3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320 (2010).
Article PubMed PubMed Central Google Scholar
Laws, K. I. Textured Image Segmentation. University of Southern California, Image Processing Institute (1980).
Ma, C., Gao, W., Yang, L. & Liu, Z. An improved Sobel algorithm based on median filter. in Proc. 2nd International Conference on Mechanical and Electronics Engineering, Vol. 1, V1–88 (IEEE, 2010).
Bovik, A. C., Clark, M. & Geisler, W. S. Multichannel texture analysis using localized spatial filters. IEEE Trans. Pattern Anal. Mach. Intell. 12, 55–73 (1990).
Article Google Scholar
Haralick, R. M. Statistical and structural approaches to texture. Proc. IEEE 67, 786–804 (1979).
Article Google Scholar
Parmar, C. et al. Robust radiomics feature quantification using semiautomatic volumetric segmentation. PloS one 9, e102107 (2014).
Article PubMed PubMed Central Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Louppe, G., Wehenkel, L., Sutera, A. & Geurts, P. Understanding variable importances in forests of randomized trees. Adv. Neural Inform. Proc. Syst. 26, 431–439 (2013).
Google Scholar
Kline, A. et al. Multimodal machine learning in precision health: a scoping review. npj Digit. Med. 5, 171 (2022).
Article PubMed PubMed Central Google Scholar
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 43, 837–845 (1988).
Pérez, E., Reyes, O. & Ventura, S. Convolutional neural networks for the automatic diagnosis of melanoma: an extensive experimental study. Med. Image Anal. 67, 101858 (2021).
Article PubMed Google Scholar
Shapley, L. S. A value for n-person games. Contrib. Theory Games 2, 307–317 (1953).
Google Scholar
Edsall, R. M. The parallel coordinate plot in action: design and use for geographic visualization. Comput. Stat. Data Anal. 43, 605–619 (2003).
Article Google Scholar
Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R. & Pfister, H. Upset: visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph. 20, 1983–1992 (2014).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Cancer Institute (1R01CA280981-01A1, 1U01CA294415-01A1, 1F31CA291057-01A1, 1U01CA248226-01, T32CA094186), the National Heart, Lung and Blood Institute (1R01HL165218-01A1), the National Institute of Nursing Research (1R01NR019585-01A1), the National Institute of Biomedical Imaging and Bioengineering (1R01EB037526-01), National Institute of Diabetes and Digestive and Kidney Diseases (1F31DK130587-01A1), the National Science Foundation (Award # 2320952), the NIH AIM-AHEAD program (1OT2OD032581-01), the VA Merit Review Award from the United States Department of Veterans Affairs Biomedical Laboratory Research and Development Service (1I01BX006439-01), the CWRU Interdisciplinary Biomedical Imaging Training Program Fellowship (2T32EB007509-16), the DOD Peer Reviewed Cancer Research Program (W81XWH-21-1-0725, W81XWH-21-1-0345, W81XWH 19-1-0668), the CWRU Translational Fellows Program, the JobsOhio program, the Ohio Third Frontier Technology Validation Fund, and the Wallace H. Coulter Foundation Program in the Department of Biomedical Engineering at Case Western Reserve University. This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. The authors would like to thank Kenneth Friedman, Joseph Willis, Sharon Stein, Conor P. Delaney, Rajmohan Paspulati, Justin T Brady, Katie Bingmer, Andrei Purysko, David Liska, Matthew Kalady, Maneesh Dave, Namita Gandhi, Mark Baker, B Nicolas Bloch, Mark Rosen, Art Rastinehad, H. Matthew Cohn, and Anamay Sharma for access to anonymized radiograpic imaging scans and pathologic/clinical variables utilized in this study. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, the U.S. Department of Veterans Affairs, the Department of Defense, or the U.S. Government.

Author information

Authors and Affiliations

Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA
Amir Reza Sadri, Sepideh Azarianpour, Prathyush Chirra, Thomas DeSilvio & Satish E. Viswanath
School of Computing and Electrical Engineering, IIT Mandi, Himachal Pradesh, India
Sneha Singh
Department of Biomedical Engineering, Emory School of Medicine, Atlanta, GA, USA
Anant Madabhushi & Satish E. Viswanath
Department of Pediatrics, Emory School of Medicine, Atlanta, GA, USA
Satish E. Viswanath

Authors

Amir Reza Sadri
View author publications
Search author on:PubMed Google Scholar
Sepideh Azarianpour
View author publications
Search author on:PubMed Google Scholar
Prathyush Chirra
View author publications
Search author on:PubMed Google Scholar
Sneha Singh
View author publications
Search author on:PubMed Google Scholar
Thomas DeSilvio
View author publications
Search author on:PubMed Google Scholar
Anant Madabhushi
View author publications
Search author on:PubMed Google Scholar
Satish E. Viswanath
View author publications
Search author on:PubMed Google Scholar

Contributions

A.R.S. conceived the study, developed the VaRFS framework, and performed experiments. S.A., P.C., and S.S. assisted with data curation and feature extraction. T.D. contributed to statistical analyses. A.M. provided domain expertise and critical feedback. S.E.V. supervised the study, guided experimental design and interpretation of results, and served as the corresponding author. All authors reviewed, edited, and approved the final manuscript.

Corresponding author

Correspondence to Satish E. Viswanath.

Ethics declarations

Competing interests

A.M. has had technology licensed to and is an equity holder in Picture Health, Elucid Bioimaging, and Inspirata Inc. Currently he serves on the advisory board of Picture Health and SimBioSys. He currently consults for Takeda Inc. He also has sponsored research agreements with AstraZeneca and Bristol Myers-Squibb. He is also a member of the Frederick National Laboratory Advisory Committee. The other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sadri, A.R., Azarianpour, S., Chirra, P. et al. Variability Regularized Feature Selection (VaRFS) for optimal identification of robust and discriminable features from medical imaging. npj Imaging 4, 5 (2026). https://doi.org/10.1038/s44303-025-00136-5

Download citation

Received: 04 February 2025
Accepted: 17 December 2025
Published: 28 January 2026
Version of record: 28 January 2026
DOI: https://doi.org/10.1038/s44303-025-00136-5

Subjects

Abstract

Similar content being viewed by others

Hybrid feature engineering of medical data via variational autoencoders with triplet loss: a COVID-19 prognosis study

A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

A new generative adversarial network for medical images super resolution

Introduction

Results

Experiment 1: comparing VaRFS against variability-screened feature selection methods

Experiment 2: evaluating parameter sensitivity of VaRFS

Experiment 3: comparing regular vs accelerated versions of VaRFS

Discussion

Methods

Overview of VaRFS

Problem statement for feature selection via LASSO

Development of VaRFS

Optimization of VaRFS

Algorithm 1

Remark 1

Convergence analysis of VaRFS

Theorem 1

Theorem 2

Acceleration of VaRFS

Algorithm 2

Remark 2

Data description

Radiomic feature extraction

VaRFS implementation and sensitivity analysis

Comparative evaluation of common FS approaches

Experimental evaluation

Ethics approval and informed consent

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links