Fig. 4

Overview of the machine learning pipeline for feature reduction of the imaging data. (o) CT-based Lung Segmentation and subsequent feature extraction, as described previously, yielded many imaging features. (i) The 5 × 5 cross-validation is the experimental setup. The reduction process is performed on each development fold (dev 1–4) and begins with (ii) Feature Standardization, including z-normalization and Yeo–Johnson transformation, followed by (iii) Feature Clustering and Exclusion using hierarchical agglomerative clustering and best predictor selection. Filtering with the Mann–Whitney U test was used for irrelevant feature exclusion. (iv) Feature Ranking is carried out via the MRMR algorithm to identify the most informative variables. This was followed by (v) Model development via logistic regression and internal validation on the validation fold (val) of the training cohort. (vi) Final Signature is defined by aggregating the feature rankings over all five cross-validation folds on the basis of the Borda score. MRMR: minimal redundancy maximum relevance; dev: development fold of the training cohort; val: validation fold of the training cohort.