Fig. 3: Polymer descriptors down-selection and ML models training.

a Optimized descriptors acquired by down-selection with four coefficients - Pearson, Spearman, Distance, and MIC coefficients - and RF model. b Accuracy of RF model based on optimized descriptors, where training R2 is 0.875 and test R2 is 0.844. c Mean-square error (MSE) of ML models at different down-selection processes, including initial (Init.), mathematical correlation (Cor.) coefficients screening, and RF model optimization (Opt.) stages. And, an additional PCA approach was applied to compare. d MSE of ML models with different polymer representation approaches. The violin plot represents the distribution of values, individual subsamples are shown in gray, and the mean and standard of MSE in black. e Pearson correlation matrices showing correlations among optimized descriptors and TC. The inset is the statistics of the Pearson coefficients distribution.