Fig. 4

Machine learning in screening key genes. (A) Key genes screening in the Lasso model. As λ increases, the values of the model coefficients gradually decrease from 9 to 2, which indicates that some coefficients are compressed to 0 as the regularization strength increases. The value of the binomial deviation gradually increases from 1.05 to 1.35. This indicates that the goodness of fit of the model may decrease as the regularization strength increases. The results show that when the λ value is 2, a balance is achieved between the goodness of fit of the model and the complexity of the model. (B) Key genes in the random forest (RF) model. The horizontal axis represents the number of trees in the RF, and the vertical axis represents the error. As the number of trees increases, the error of the RF model will gradually decrease. We sort the genes screened by importance. (C) Key genes in the SVM-RFE model. The horizontal axis represents the number of features. The results show that changes in the number of features will change the accuracy and error of the model. When the number of features is 7, the highest accuracy and the lowest error are obtained. (D) Venn diagram shows that 2 key genes are identified via the above three algorithms. LASSO: Least Absolute Shrinkage and Selection Operator. SVM-RFE: Support Vector Machine-Recursive Feature Elimination. RF: Random Forest.