Fig. 2: Feature selection and final model identification in the discovery cohort.

A Area under the receiver operating characteristic curve (AUC) values for different feature subsets selected using the SHapley Additive exPlanations (SHAP) method, along with feature ranking scores. Source data are provided as a Source data file. B SHAP summary bar plot highlighting the relative importance of features in the random forest (RF) model. C Illustrating the distribution and correlation of the six selected features. D Receiver operating characteristic (ROC) curves demonstrating the diagnostic performance of the RF model when the six selected features were used for primary vitreoretinal lymphoma (PVRL) detection. E Confusion matrix heatmap visualizing the classification performance of the six-feature RF model in diagnosing PVRL. DT decision tree, GLM generalized linear model, GBM gradient boosting, PDW platelet distribution width, PLCR platelet large cell ratio, HG hemoglobin, PLT platelet count, MPV mean platelet volume, HCT hematocrit, RDWSD red blood cell distribution width—standard deviation, RDWCV red blood cell distribution width—coefficient of variation, RBC red blood cell count, WBC white blood cell count, MCHC mean corpuscular hemoglobin concentration, PIV pan-immune inflammation value.