Fig. 2

The LASSO regression and RFE-RF algorithms were used to screen the variables. (A) Path plot of LASSO regression coefficients for 58 risk variables. The vertical axis showed the value of the coefficients, the lower horizontal axis showed the log(λ) of the regularization parameter, and the upper horizontal axis indicated the number of nonzero coefficients retained in the model at each point. (B) Cross-validation curve. The vertical axis represented the log value of the penalty coefficient, denoted as log(λ). The lower horizontal axis represented the likelihood bias, while the upper horizontal axis indicated the number of variables selected. Smaller values on the vertical axis indicated a better fit of the model. (C) Variable ranking change curve. The horizontal axis represented the number of variables, and the vertical axis represented the accuracy of the curve after fivefold cross-validation. Among them, the accuracy for 20 variables was 0.772. The closer this value was to 1, the higher the accuracy. (D) The 20 variables after RF-RFE screening were ranked by importance, and only the top 13 variables were shown in this figure. (E) Venn diagram. Visually presented the commonalities and differences in variable selection between LASSO and RF-RFE.