Fig. 4: Machine-learning SVR models predict L2 proficiency.

The predictability of L2 (English) HKDSE grades was estimated by the correlation coefficients (cc) between the predicted and the observed language proficiency scores based on tenfold cross-validation with 10,000 iterations. a The importance ranking of all predictors of L2 proficiency, where the x-axis represents the importance value and y-axis represents the variables. The importance value was calculated from tenfold cross-validation with 1,000 iterations. b When all predictors are included in the SVR model to predict L2 (English) HKDSE grades, the distribution of prediction values was significantly different from the null distribution (p < 0.001, Cohen’s d = 7.17). c With only L1 (Chinese) HKDSE grades as the predictor of English HKDSE grades, the distribution of prediction values was also significantly different from the null distribution (p < 0.001, Cohen’s d = 6.45).