Fig. 1: Comparison of actual and predicted Cantonese proficiency scores using the GBRT model.

The blue line represents actual Cantonese proficiency scores of respondents in the test dataset, whereas the orange line indicates predicted scores generated by the GBRT model. The high overlap between these lines signifies strong model performance, reflected by an R² of 0.9021 and prediction accuracy of 83.50%, demonstrating GBRT’s capability in effectively capturing underlying data patterns.