Abstract
Exploring students’ cognitive abilities has long been an important topic in education. This study employs data-driven artificial intelligence (AI) models supported by explainability algorithms and PSM causal inference to investigate the factors influencing students’ cognitive abilities, and it delved into the differences that arise when using various explainability AI algorithms to analyze educational data mining models. In this paper, five AI models were used to model educational data. Subsequently, four interpretable algorithms, including feature importance, Morris Sensitivity, SHAP, and LIME, were used to globally interpret the results, and PSM causal tests were performed on the factors that affect students’ cognitive abilities. The results reveal that self-perception and parental expectations have a certain impact on students’ cognitive abilities, as indicated by all algorithms. Our work also uncovers that different explainability algorithms exhibit varying preferences and inclinations when interpreting the model, as evidenced by discrepancies in the top ten features highlighted by each algorithm. Morris Sensitivity presents a more balanced perspective, while SHAP and feature importance reflect the diversity of interpretable algorithms, and LIME shows a unique perspective. This detailed observation highlights the practical contribution of interpretable AI algorithms in the field of educational data mining, paving the way for more refined applications and deeper insights in future research.
Similar content being viewed by others
Introduction
Cognition, as the fundamental ability through which humans understand, process, and apply information1, permeates every aspect of human lives, particularly having a positive impact on education2. Higher-order cognitive abilities, including critical thinking and creativity, play a crucial role in helping students develop the capacity for autonomous learning and lifelong learning3,4,5, when it comes to subjects education such as art6 or music7: A correlation has been demonstrated between students’ cognitive abilities and artistic creativity8. Improving school students’ cognitive abilities has become an important topic in educational research9,10, with strategies ranging from classroom-based teaching11 to emotional education12. Traditionally, research on cognitive abilities has relied on statistical methods to reveal the laws of individual cognitive development and explore influencing factors, including self-efficacy13,14 and family background15. Although these statistical methods have provided rich theoretical and practical evidence and experience, some studies have limited their ability16 to identify complex and implicit patterns17: Buchberger18 pointed out the flexibility of SEM, which is a common approach in educational assessing, can accommodate many complex assumptions about the underlying structure of psychological constructs but it makes a priori estimates of statistical power and required sample size challenging.
With the rapid advancement of AI technologies in many practical science fields such as health19,20,21, finance22,23, and E-commerce24,25, a data-driven perspective has emerged in the field of educational technology26, offering a novel approach to examining the relationship between student’s performance and student characteristics27. Through AI’s powerful capabilities in large-scale data processing and predictive analysis, researchers can dig deeper into potential factors and more effectively identify pathways to improve students’ cognitive abilities28. However, some studies on educational technology have not discussed the accuracy of the models in depth28,29,30, and the internal decision-making process of the ‘black box’ models used in the studies is difficult to interpret31, and some research on educational data mining overly focus on modeling and analysis, lacking detailed comparisons and discussions of the local explainability of features. This opacity is particularly concerning in the field of education, as educational decisions involve not only macro-level administrative regulation but also understanding and improving individual learning processes.
To address the issue of explainability in AI algorithms for exploring cognitive abilities in the field of education, this study integrates multiple AI algorithmic approaches to conduct a large-scale survey of students, aiming to examine the relationship between students’ cognitive abilities and their specific characteristics. Machine learning algorithm models, including Lasso, random forest (RF), XGBoost, and support vector machine (SVM), were used to establish a data-driven model. Mainstream interpretable algorithms such as SHAP, LIME, and Morris Sensitivity were used to identify and analyze important features of the final model. Finally, we used propensity score matching (PSM) as a causal testing tool to measure the characteristic variables selected by the explainable AI algorithm to explore the factors that affect students’ cognitive abilities. Consequently, this study proposes the following two research questions:
RQ1
Which factors are more important in influencing art students’ cognitive abilities? To what extent?
RQ2
What are the potential benefits of different explainability algorithms in the process of identifying factors that influence cognitive abilities?
Related work
Machine learning and explainability algorithms
Statistical methods generally begin with explicit hypotheses and design experiments or observations to test those hypotheses, using statistical inference tools to assess whether the data supports the hypotheses31. In natural sciences, the primary goals of utilizing AI algorithms, including machine learning (ML), are scientific understanding, inferring causal relationships from observational data, and even gaining new scientific insights32. By leveraging machine learning methods and following the foundations of scientific philosophy, it is now possible to process and analyze large amounts of scientific data from experiments, observations, or other sources automatically to a certain extent. Some advanced algorithms, such as LightGBM or hybrid methods, have been shown to achieve good results in prediction33.
In ML, models are categorized into white-box models and black-box models. White-box models, such as simple decision trees, linear regression models, and Bayesian networks, are designed to have easily interpretable and transparent internal workings and steps. In contrast, black-box models, such as SVMs and Neural Networks (NN), have much more complex internal mechanisms. The interpretability of ML model predictions can be categorized into two types: global interpretability (GI) and local interpretability (LI). GI focuses on understanding the overall logic of the model and following the complete reasoning of possible outcomes, whereas LI targets explaining specific decisions or individual predictions34,35. While LI algorithms are more broadly applicable to models compared to GI methods, they may lack consideration for certain complexities compared to GI methods.
Characteristics and cognitive abilities
Piaget’s cognitive theory emphasizes that cognitive ability is the ability of an individual to learn and understand, store, and interpret information36. The characteristics affecting cognitive ability are mainly attributed to the following aspects.
Personal characteristics
In the research on cognitive ability, personal characteristics have been considered as one of the most important factors affecting the development of students’ cognitive ability. Personal characteristics are generally divided into two categories, namely, the demographic characteristics of students and the spontaneous state of students. In studies related to cognitive ability, personal characteristics have consistently been regarded as one of the most critical factors influencing the development of students’ cognitive skills. Personal characteristics can generally be categorized into two major groups: demographic characteristics and students’ spontaneous state. Demographic characteristics related to cognitive ability include gender, nationality, race, age, etc37,38,39. , while students’ spontaneous state is related to students’ self-efficacy40, learning habits41, and emotional needs42, etc.
Family background
Family background plays an important role in students’ cognitive abilities in the study40,43. The family background is potentially considered to include socioeconomic status44, parental education45, and parental involvement46, etc. factors from the family background are often considered to have very complex pathways of influence on students’ cognitive abilities: high socioeconomic status families can foster students’ cognitive development by providing high-quality learning resources such as books and tutoring services, alongside robust educational support. Moreover, the culturally enriched and inclusive ethos typically observed in high socioeconomic status families profoundly shapes other critical elements, including learning attitudes, thereby bolstering cognitive abilities through complex mediating pathways47, which highlights the interactive role of family background in fostering cognitive abilities48. Visualisation tools can increase transparency for patients to understand the performance of students49.
Growth experience
Growth experiences are generally divided into two key areas: educational exposure and social-environmental interactions, encompassing a range of life experiences and environmental influences that shape an individual’s psychological and intellectual growth over time. Educational exposure includes formal and informal learning experiences, such as school curricula, extracurricular activities, and diverse educational opportunities. For instance, students who participate in innovative music-based courses supported by diverse teaching strategies often develop stronger critical thinking and reasoning skills50. Healthy Social-environmental interactions, including positive peer interactions and mentorship opportunities, have been proven to foster adaptability, communication skills, and cognitive flexibility51,52. Conversely, limited access to enriching environments may hinder cognitive development53.
Teacher-student relationship
The teacher-student relationship is widely considered to be a factor that cannot be ignored in influencing students’ cognitive abilities. It is characterized by mutual and supportive communication, which collectively enhances students’ cognitive and academic outcomes54. Constructive feedback and personalized guidance can directly shape cognitive skills by helping students refine their self-regulation ability55 and develop meta-cognitive strategies56. It also emphasizes that the role of the teacher-student relationship extends beyond direct instruction. Teachers also act as role models, inspiring students’ attitudes towards learning and intellectual exploration. Positive relationships have been shown to mitigate the negative effects of external stressors and further emphasize their crucial role in cognitive development57.
Application of explainable algorithms in educational data mining
There have been some applications for explainable AI and ML in educational data mining for students58. Guleria et al. proposed a framework for using ML to provide career counseling for students, using white-box and black-box models to explain the academic and employability attributes of students that are critical to their employment and skill development59. Islam et al. proposed an enhanced EDM system with explainability, aiming to improve the classification accuracy of students’ programming performance and identify the key factors affecting the classification60. Islam proposed an enhanced EDM system with explainability, aiming to improve the classification accuracy of students’ programming performance and identify the key factors affecting the classification60. However, the above-mentioned study did not conduct further analysis on the differences between various interpretability algorithms, though a comparison of interpretability studies in educational models may be crucial. The review by Gunasekara provides an overview of the current state of research on interpretability in educational models and provides an in-depth discussion on the effectiveness of interpretability implementation in educational contexts61. She then further analyzed the key evaluation metrics of two machine learning models, focusing on an intrinsically interpretable model and post-hoc interpretability techniques62. Current research still lacks large-scale evaluation and investigation of multiple models and interpretability, which is one of the areas that this paper hopes to explore and contribute to.
Methodology
This study uses the educational data mining process and the explainability algorithms interpretation process to conduct research. The research method and process are shown in Fig. 1. Meanwhile, this study will be divided into two parts to address the research questions. After the data is collected and processed, the first part focuses on using ML algorithms, including Lasso Regression, RF, XGBoost, NN, and SVM, to perform hyperparameter tuning and select the best-fitting model for the data, thereby answering RQ1. In the second part, explainable AI methods, including SHAP, LIME, and Morris Sensitivity, will be applied to locally explain the factors that affect students’ cognitive abilities and examine the differences between these methods. Causal testing tool PSM will be used to analyze the screened factors that affect students’ cognitive abilities to provide answers to RQ2.
Dataset and data processing
This paper targets freshmen who have just graduated from high school, from two higher education institutions, including the Academy of Arts and Design, Fuzhou University, and Beijing Normal University-Hong Kong Baptist University United International College. We collected information on students’ conditions, including potential factors related to cognitive abilities63 based on the China Education Panel Survey (CEPS)64. All data were collected after informed consent was obtained and anonymized. We distributed questionnaires online and collected information about students. The information covered multiple aspects, including students’ basic personal information, family background, life experiences, and teacher-student relationships. The original questionnaire was designed for high school experience students; therefore, items that were not relevant to the context of university students were removed to better align with their circumstances.
This study continued to adopt the cognitive ability questionnaire from the CEPS framework for cognitive ability assessments, including three directions: language, reasoning, graphics and space, and calculation and logic65. The standardized total scores derived from this assessment were used as the benchmark for measuring cognitive ability levels, and mean squared error (MSE) was used as a fitting criterion.
The corresponding encoding and variable name relationships for these features are listed in the supplementary materials. Some potentially repetitive or irrelevant questionnaire items were reviewed and consolidated66, and certain demographic-related features, such as gender, profession, and age, were excluded due to privacy concerns and the potential risk of introducing bias67. During the coding process, label encoding and standardization techniques were applied. Several baseline variables, such as future career aspirations and study hours, were merged into broader categories to simplify processing. Additionally, samples with missing variables were deleted.
Ultimately, valid data from 1086 students were used, encompassing a total of 78 independent questions, including 212 features. 74.8% of the data came from Fuzhou University. 61.7% of the students were female. A total of 25.64% of the questions (N = 20), which involved multiple high-cardinality and complex categorical variables, were further split during processing.
For example, in the questionnaire of this study, w2b1200-w2b1207 is a further subdivision of the question “What Interests and Hobbies Do You Have?”, which includes chess or sports. The split values represent the degree of the behavior, and this problem contains 7 features. Figure 2 shows the proportion of unique values in the variables, except cognitive ability scores. 36.8% of the features are secondary classification features, and 26.8% of the features are five-level classification features. Figure 3 provides some detailed distributions of some potentially important variables. This paper does not further deal with them. After all processing is completed, we split the data twice by train_test_split, dividing the data in a ratio of 7:2:1, and specifying the same random seed (seed = 41).
ML algorithm
Lasso regression
Lasso (also known as Least Absolute Shrinkage and Selection Operator) regression68 is a linear regression model with L1 regularization. Its primary function is to prevent overfitting by applying a penalty to model parameters. The optimization formula of Lasso is as follows:
In the formula, \(\:y\) represents the response variable vector, \(\:X\) is the feature matrix, and \(\:\beta\:\) is the vector of regression coefficients. The term\(\:\:|y-X\beta\:{|}_{2}^{2}\) is the residual sum of squares, which measures the model’s fit to the data, while \(\:|\beta\:{|}_{1}\:\)represents the L1 norm69 of the regression coefficients, imposing a sparsity constraint that encourages some coefficients to become exactly zero. The parameter \(\:\lambda\:\) is the regularization parameter, balancing the trade-off between the goodness of fit and the sparsity of the coefficients.
Random forest
RF70 is an ensemble learning method, which is shown in Fig. 4, that makes predictions by constructing multiple decision trees. Each tree is trained on a random subset of the data, and random features are selected during the training process. For regression problems, RF takes the mean of all decision tree predictions as the most accurate prediction criterion, as follows:
In the formula, \(\:\left(x\right)\)is the prediction of the \(\:\text{t}-\text{t}\text{h}\) tree for input \(\:x\); \(\:T\) is the total number of decision trees.
XGBoost
XGBoost71 (Extreme Gradient Boosting) is an efficient tree-based boosting algorithm widely used in ML competitions. It improves the model step by step by building decision trees, focusing on samples with larger training errors. The objective function of XGBoost can be expressed as follows:
In the XGBoost formula, \(\:n\) represents the number of samples, and the term \(\:\varOmega\:\left({f}_{t}\right)\:\)is a regularization term designed to control the complexity of the model, preventing overfitting. It is determined by factors such as the number of leaf nodes in the decision tree, the weight vector of the leaf nodes, and the hyperparameter of the regularization strength. The remaining part is the loss function, which is used to quantify the difference between the true label \(\:{y}_{i}\:\)and the predicted value \(\:{y}_{i}^{\left(t-1\right)}\). For optimization, the objective function is approximated using a second-order Taylor expansion
Neural network
NN is an algorithm, which is shown in Fig. 5, inspired by biological neural systems that process information through multiple layers. Each neuron receives inputs, generates outputs through an activation function, and has adjustable weights connecting the layers.
Support vector machines
SVM72 is a supervised learning model, which is shown in Fig. 6, widely used for classification and regression tasks. Its core idea is to find the optimal hyperplane in the feature space that separates samples of different classes.
In the Support Vector Regression (SVR) formulation, \(\:\text{w}\) represents the weight vector of the regression function, and \(\:b\) is the bias term. The goal is to minimize the regularization term \(\:\frac{1}{2}|w{|}^{2}\), \(\:{\xi\:}_{i,}\:\)and \(\:{\xi\:}_{i}^{*}\) are slack variables used to handle forecast errors outside the ϵ tolerance range. \(\:C\) is a positive penalty parameter that balances model complexity and prediction errors outside the range of the slack variable.
Hyperparameter optimization
The impact on students’ cognitive abilities will also ultimately predict the results. To ensure the local optimum of the relationship between students’ cognitive abilities and features, the goal is to find the best parameter combination for each model by optimizing the hyperparameters of each ML model and to obtain the lowest mean square error loss on the validation set to determine the best parameters.
Therefore, this study introduced a hyperparameter optimization algorithm to automatically calculate the optimal parameters for each algorithm: The TPE73 (Tree-structured Parzen Estimator) algorithm, which is shown in Fig. 7, in Hyperopt74 is used for hyperparameter optimization, with an additional validation set employed for fine-tuning during model training. A custom objective function instantiates the corresponding model using the provided hyperparameters. The function evaluates the model on the validation set using MSE as the metric, returning the loss for optimization. Each model has a specifically defined hyperparameter search space, which is shown in Table 1. For instance, the Lasso model optimizes the regularization strength parameter, while tree-based models include parameters such as the number of trees, maximum depth, minimum split ratio, and learning rate. NN considers the number of hidden layer neurons, activation functions, regularization strength, and maximum iterations. SVM optimizes the regularization parameter and the loss tolerance.
Explainable AI algorithms
After selecting the most suitable one, this study used SHAP, LIME, and Morris Sensitivity to explore the model and explain the factors behind the prediction decision of a specific sample. Especially in the field of education, local explanations increase the transparency and credibility of black box models.
SHAP
SHAP75 (Shapley) is an explainability algorithm for black box models. It is derived from the Shapley value in game theory and is used to calculate the contribution of each feature to model prediction76.
In the SHAP formula, \(\:{\varphi\:}_{i}\) shows the SHAP value for feature \(\:\text{i}\), indicating its average marginal contribution to the prediction output. The feature set \(\:\text{N}\) includes all features, and \(\:\text{N}\) is a subset of features that excludes feature \(\:i\). \(\:f\left(S\cup\:\left\{i\right\}\right)-f\left(S\right)\) stands for the changes in model predictions. \(\:\frac{\left|S\right|!\left(\left|N\right|-\left|S\right|-1\right)!}{\left|N\right|!}\) represents the weight after traversing all possible feature combinations. The SHAP formula weighs and averages the marginal contributions to ensure that the contribution of each feature is independent of the order in which it appears.
LIME
The core idea of LIME77 (known as Local Interpretable Model-agnostic Explanations) is to fit a local linear model to approximate the behavior of a complex model around a specific sample. Its objective is to minimize the following formula:
LIME uses a simple linear model \(\:\text{g}\) to fit the law of the black box model \(\:\text{f}\) near a specific sample to obtain a local explanation of the result. The loss function \(\:\hspace{0.17em}L\) is used to measure the prediction consistency of the linear model and the complex model near the selected sample, while the weight function \(\:{{\uppi\:}}_{\text{x}}\) defines the importance of different samples to the fitting process. The regularization term \(\:\varOmega\:\left(g\right)\:\)is used to control the linear model complexity.
Morris sensitivity
Morris sensitivity78 analysis is a global sensitivity analysis method that makes small random perturbations to each input to observe the changes in model output, thereby calculating the impact of input variables and testing their robustness.
Among them, a represents the average absolute effect of the input variable, and \(\:{\mu\:}^{*}\) represents the standard deviation of the effect of the input variable. During the analysis process, multiple disturbances are used to calculate the incremental effect produced by each disturbance. \(\:r\) is the number of perturbation tests, and higher \(\:{\mu\:}^{*}\) and \(\:{\upsigma\:}\) indicate that the variable has a greater impact on the output and a more significant interaction effect.
Causal testing model
Our primary focus is on comparing how different interpretability algorithms identify features; however, determining the causal effects of variables is essential for understanding students’ cognitive abilities. To achieve this goal, this paper adopts a causal evaluation framework, as shown in Fig. 8, to identify the causal effects of highly consistent variables identified by different interpretability algorithms by combining PSM79 and robustness analysis called the Rosenbaum bounds test80, and explores their impact on cognitive ability.
PSM first estimates the propensity score of each sample through a logistic regression model and the nearest neighbor matching method, and finds a control group sample with the closest propensity score to the treatment group sample to construct a balanced data set, and finally compares the differences between the two groups of results through a t-test. The reason this study chose PSM instead of other causal methods, such as difference-in-differences, is that different categorical groups in observational studies may have problems with unbalanced distribution of covariates, which may lead to biased results. PSM can ensure that the two matched groups are similar in observed characteristics, thereby more reliably estimating causal effects. To assess whether the propensity score model is adequately specified, it is important to perform diagnostics on the characteristics for balance. We used a common standardized mean difference threshold of − 0.2 to + 0.2 to determine the balance of the matched sample. To constrain the distance between the treatment and control groups, we used a caliper of 0.2 standard deviations as recommended by Austin81.
The PSM method is mainly used to evaluate the average treatment effect (ATT) of binary treatment variables on outcome variables. Therefore, after identifying highly similar variables, this study continues to perform binary treatment on the variables and uses PSM to estimate causal effects.
To simplify the process and explore relevant features most efficiently, we apply a systematic multivariate circular causal test, which sets the potentially highly correlated variables as treatment variables in turn, and rotates the remaining variables as covariates to explore which of the multiple variables may be the causal driver. If the causality of a model is not significant, the number of covariates will be reduced in turn for repeated causal tests.
Finally, we will use the Rosenbaum boundary sensitivity analysis method to analyze the robustness of the causal effect to ensure that unobserved confounding variables do not affect the results of ATT. Gamma is used to measure how large a bias is needed to overturn the current causal conclusion if there are unmeasured confounding variables. When there are confounding variables, the larger the Gamma value, the stronger the confounding effect. “Gamma = 2” means that the selection bias of the unobserved variable is twice that of the observed variable. If the confidence interval of ATT does not contain 0 under a given Gamma, the causal effect is considered significant.
Results
Feature recognition
It is important to realize that different treatments of the dataset and model selection will affect the final model performance. In order to alleviate the impact of overfitting, this study used cross-validation with different K folds (2–7) to select parameters and models. At the same time, we used NN dropout technology and Lasso regularization to reduce the risk of overfitting, and we also expanded the search range of parameters as much as possible to make the study consider a large enough situation and the specific parameters are shown in Table 2. From the results of the parameters selected by the final model, it can be found that cross-validation with different folds does have a certain impact on data fluctuations. When selecting a model, in order to balance performance and interpretability and avoid the risk of overfitting, we consider two aspects of model selection: the average standard deviation of the performance and the stability of the performance at different folds, including standard deviation (SD) and coefficient of variation (CV).
Among them, RF has the most outstanding performance, with the lowest average MSE at different folds, while NN has the highest average MSE. In terms of stability, LASSO is better, while NN has the largest fluctuation. Therefore, after a comprehensive analysis, we selected RF with the best performance and moderate stability fluctuation as our baseline model to balance the difficulty between performance and interpretability.
It is important to note that this study utilized the TPE algorithm for hyperparameter optimization, which may not fully capture the potential performance of all optimization methodologies, as alternative algorithms, such as Simulated Annealing, could yield different outcomes and provide further insights into model performance under varying parameter configurations82. Given that RF is inherently a white-box model with certain explainability features, this study reintroduces RF-specific feature importance as an interpretable result for comparison within explainability algorithms. The specific mapping between feature numbers and questionnaire items is shown in Appendix A-1 and A-2.
Finally, we merged all the validation data and trained the model for the final test on the test set. The results of the final model on the test set are MSE: 0.4953, RMSE: 0.7038, MAE: 0.554, R2: 0.2642, and PCC: 0.5156. Among them, R2 is 0.264, which means that the model can only explain 26.4% of the variance. In a popular sense, this indicator is not particularly ideal. One reason is that the random forest model is not linear. Therefore, it does not perform well in the R2 index that measures the linear fitting ability. The PCC is only about 0.5, which also proves that the linear correlation of the model is not very strong.
Figure 9 illustrates the feature importance derived from RF in this context of RF feature importance reflects the gain change in MSE by splitting the item at the split node. The feature importance of w2b18, w2b02, w2a27, and w2a12 are 0.3233, 0.2545, 0.1313, and 0.1074, and the feature importance values of the remaining variables are all below 0.1.
Figure 10 and Table 3 present the results of the Morris Sensitivity analysis. The ranking of values in Morris Sensitivity is based on the absolute magnitude of the computed metrics \(\:{\mu\:}^{*}\). \(\:{\mu\:}^{*}\)represents the average effect size of students’ cognitive ability characteristics on the model output. The larger the value, the more important the variable is. \(\:{\theta\:}^{*}\) represents the degree of variation in the output effect of variables that affect students’ cognitive abilities. The larger the\(\:{\:\theta\:}^{*}\), the greater the instability of the variable effect. From the table, we can see that among the top five most important variables, W2b18, and W2a27 have the best average contribution to the results, but their stability is not as good as the following variables; Although W2a28 has good stability, its contribution is low, and the confidence level \(\:{\alpha\:}^{*}\) of all variables is less than 0.05.
Feature importance ranking of characteristics affecting students’ cognition under the RF algorithm, where the full questions can be found in Appendix A-2 right side.
Figures 11 and 12 illustrate the rankings of value derived from the LIME and SHAP algorithms, respectively. In Fig. 7, the length of the bar of a feature in the LIME graph indicates the contribution of the feature to the final prediction. The longer the length, the greater the influence of the feature. Its influence has positive and negative directions. Local features such as 0.00 < w2a12 < = 2.00, 2.00 < w2b18 < = 3.00, w2b02 > 2.00 have significant positive effects, while features such as w2d0207 < = 0.00 and w2c0704a < = 0.00 contribute large negative contributions, with the maximum about − 0.05.
In Fig. 8, features are sorted by their average absolute SHAP value. Each point represents the SHAP value of a sample. The color of the point usually indicates the size of the feature value. Generally, red indicates a larger feature value and blue indicates a smaller feature value. The actual situation depends on the variables in the feature (for example, categorical variables only represent the options of the variable, not the size). In addition to color, the thickness of the line represents the size of the feature value or the contribution of the feature to the prediction. The thicker the line, the higher the contribution to the final result. The SHAP value distribution of w2b18 ranges from 0 to 0.3, and the positive impact on the model is significant. The red high eigenvalues have a strong positive push to the model output. The same feature is w2a12, whose SHAP value ranges from 0.05 to 0.25. w2b02 contributes both positive and negative SHAP values to the model from − 0.2 to 0.2. However, w2a27 and w2a28 have a relatively neutral impact on the model, with SHAP values concentrated between − 0.1 and 0.1.
It is noteworthy that LIME and SHAP inherently assess the contribution of individual features to the prediction of specific samples, rather than directly reflecting their influence on the overall model. To achieve a standardized evaluation in this study, the local contributions of all features calculated by LIME and SHAP were aggregated and weighted, thereby approximating their global impact on the model. The specific values of the top ten characteristics of the results of all interpretable algorithms are shown in Table 4. However, it should be noted that the characteristics of each algorithm are not the same.
Causal testing
We selected the five features with the highest frequency and ranking in four different interpretability algorithms as the features identified by the interpretability algorithms for causal testing. Finally, the variables, including w2a12, w2b02, w2a27, w2a28, and w2b18, were selected for causal testing on all data.
Table 5 shows the results of the PSM analysis, which was used to assess the ATT of the five treatment variables on the outcome variable of cognitive ability. The analysis results showed that the five treatment variables all showed significant treatment effects after PSM (p < 0.001), indicating that the relationship between these treatment variables and the outcome variables was highly significant. In terms of ATT, the ATT values of w2a12, w2b02, w2a28, and w2b18 are all greater than 0.2, indicating strong causality in the field of social sciences, while the ATT of w2a27 is 1.81, also showing acceptable causality.
In terms of covariate balance, the matching results of most models meet the standard of SMD < 0.2, indicating a good match. However, only when w2a28 is the treatment variable, some individual variables are poorly matched (w2a12: 0.285). Failure to match may lead to an overestimation of the causal effect. However, from the perspective of overall causality, this analysis result is acceptable and has a certain degree of explanatory power and practicality, but further analysis of stability is needed.
Table 6 shows the results of robustness in the causal model. From the results, we can see that even when Gamma = 2, there is no case where ATT is 0 in all significant intervals, which shows that all causal assumptions are very robust.
Discussion
The self-perception of students
All interpretability algorithms and causal tests consistently showed significant correlations and causal relationships between students’ self-perception, including self-assessed math ability (w2b02) and expectations about their academic performance (w2b18), and their cognitive abilities. w2b18 ranks first in the relevance of all three interpretability algorithms, and w2b02 ranks second in feature importance and third in Morris Sensitivity analysis. It can be seen from the LIME chart that w2b02 has a significant role in promoting the positive value of the prediction result. This is more obvious in the SHAP graph. A large number of dense, thick lines gather at the right end of the SHAP, while the blue dots are distributed in the negative values as thin lines, indicating that this feature can be used to distinguish cognitive abilities well. In the causal test, both w2b02 and w2b18 showed a high degree of causality with cognitive ability results. This suggests that the more difficult students perceive mathematics to be and the lower their academic expectations are, the lower their cognitive abilities are.
Albert Bandura’s self-efficacy theory posits that students’ domain-specific academic performance correlates with their perceived capabilities83. Self-efficacy is related to students’ cognitive abilities in a large number of studies84,85. Therefore, we will not discuss it in detail, but this study wants to emphasize that the practice of mathematics self-perception is important, which is less mentioned. In the context of academic learning, students who perceive a subject as “difficult” may experience emotional fluctuations such as heightened anxiety, which be able to negatively impact their learning motivation and engagement over time86. This is especially pronounced in mathematics, a discipline that is intricately tied to meta-cognition87.
Emerging evidence suggests a potential interplay between mathematical reasoning and domain-specific cognitive development88. In artistic disciplines, self-efficacy in mathematical learning may influence the metacognitive frameworks underlying artistic creation89. For example, because both art and geometry involve the visualization and manipulation of images, developing abilities in geometric reasoning tasks can greatly enhance art students’ understanding of spatial perception and visual art, increasing their professional abilities90.
We suggest that schools should explore constructive ways to integrate art classes into related mathematics majors when opening first-year courses at the university. In practice, this highlights the need to prioritize interventions aimed at addressing math anxiety and enhancing self-efficacy among university students. Furthermore, even at the university level, institutions should actively engage parents to create additional opportunities for the growth of first-year students. This may include active monitoring and listening to students’ math anxiety and factors related to self-perception.
The role of parental expectations in students’ cognition
Algorithms indicated a measurable association between parental expectations and the cognitive abilities of art students (w2a27, w2a28). Although there was a possibility of high correlation between the two features w2a27 and w2a28, considering that the two may had different paths to affect the results; and in multiple nonlinear models, both variables were not automatically eliminated, and the robustness test was also good, so this study still retained the results of these two similar features for causal modeling. Within the four explainability algorithms, three consistently identify parental expectations as the top 5 most influential factors in shaping students’ academic performance (Except LIME). Among the top ten features, w2a27 ranks second in Morris sensitivity (\(\:{{\upmu\:}}^{\text{*}}\)=0.2893, \(\:{{\uptheta\:}}^{\text{*}}\)=0.200). The w2a28 feature ranks seventh in LIME, with a LIME value of 0.0401. In the SHAP graph, w2a27 ranks 4th and can distinguish positive and negative results to a certain extent. In the causal test, both characteristics showed strong positive causality, indicating that the degree of parental expectations can affect children’s cognitive abilities.
This finding provides computational evidence supporting that parental expectations may significantly influence students’ cognitive abilities and self-concepts. Consistent with expectancy-value theory, parents’ expectations emerge as a potential extrinsic factor thatmay correlate withstudents’ cognition and evaluation of educational goals91.
It should be noted that family expectations here represent a narrow sense, while family expectations in a broad sense also include family support for students. Family support includes parents’ investment in students and their own family’s growth environment, such as the family network issues brought about by w2a12, which also has a strong causal relationship with cognitive ability. Some studies have shown that parents’ emotional expectations can have a strong mediating effect on children’s cognitive ability and academic performance92, while some studies believe that it is an interactive effect45. At present, our research is not sufficient to support the qualitative analysis of its impact. However, it is undeniable that among all explainability algorithms, the impact of parental expectations may far exceed that of some explicitly observed variables. This influence not only directly affects students’ cognitive ability itself, but also affects the cultivation of cultural literacy and the cultivation of art students’ spatial perception ability, thereby further regulating cognitive performance93.
Therefore, attaching importance to family education and cultivating a good family atmosphere is a topic that needs to be continuously paid attention to in the education sector.
The role of health in students’ cognition
As shown in Table 7, signals related to physical illness emerged in multiple algorithms. Although their importance rankings in multiple interpretable algorithms are very low, with the highest disease factor appearing in the LIME algorithm at − 0.0459. Therefore, these factors were not selected for causal modeling, but the hidden meanings represented by these features need to be further treated and processed.
In the medical field, some studies have already examined the cognitive abilities of children with chronic brain disorders and provided data that explain the phenomenon of impaired attention and the increased risk of cognitive impairment94. The physics of this neural damage is therefore sensed in an algorithm like LIME that is very sensitive to outliers: All three algorithms reveal the impact of Nearsightedness, a chronic disease, on cognitive ability, with a more obvious trend seen in SHAP: All three algorithms reveal the impact of chronic disease Nearsightedness on cognitive ability. In SHAP, a more obvious trend can be seen: all points are concentrated on both sides of the 0 value to form a thick line, indicating that the final contribution to the result is relatively large.
Physical and psychological stress have been shown to reduce students’ cognitive abilities by confounding their learning, memory, and cognitive functions95. For art students, this includes affecting their artistic creation ability, reducing their cognitive focus, and hindering them from achieving a “creative climax”. To improve the artistic and creative performance and cognitive ability of art students, schools should pay close attention to students’ physiological and psychological emotional changes, especially some somatic manifestations caused by anxiety and emotional stress. Educational philosophies should focus on the overall well-being of students. By integrating health education and cognitive development strategies, educators can develop more effective curricula and supportive measures, ensuring a solid foundation for their long-term development96.
Comparison of explainability algorithms in identifying cognitive factors
Different explainability algorithms showed different preferences when measuring features that affect the cognitive abilities of art students, which is shown in Fig. 13. This paper counts the top ten relatively important features of each algorithm, and two doctoral students in education majors classify them into categories and calculate the proportion of them. From a statistical perspective, it can be observed that each explainability algorithm exhibits its distinctive characteristics in terms of classification proportions and differences.
Overall, ‘Family environment and supports’ and ‘Moral and social aspects’ were identified by multiple interpretable algorithms and showed a high degree of similarity. Morris Sensitivity prioritizes family-related factors and social aspects. SHAP and feature importance maintain this trend and provide more information, such as health, hit me efficiency, etc. The LIME algorithm was different: Although LIME highlighted family support as a very large dominant factor, it was more sensitive to the diversity of different types of features: compared to other interpretability algorithms, it was the only one that did not include mathematical self-perception as an important feature, and LIME was also the only one that put the most weight on health-related features.
Various algorithms showed different tendencies in feature selection, which may be due to differences in their underlying feature calculation mechanisms. For instance, the Morris Sensitivity algorithm demonstrates a more balanced distribution across all classifications, potentially due to its global approach to feature analysis, which takes into account complex interactive effects, and the classification categories are more stable. In contrast, algorithms such as SHAP adopt a localized interpretative perspective, relying on additive averaging. As a result, they may exhibit biases towards specific features with higher local weights. This bias may result in excessive attention to a single variable, resulting in diversity in the classification of the final results.
Another example worth discussing is LIME, which identified health-related characteristics that were not observed in other explainability algorithms. This distinction arises from LIME’s strong emphasis on the influence of localized features. The two health-related factors exhibit an extraordinarily high impact on students’ cognitive abilities, yet their sample size is notably small (as health-related issues are uncommon among typical students). Consequently, other explainability algorithms may have excluded these factors as noise due to their mechanisms, whereas LIME successfully identified their significance.
Harmonizing different interpretability algorithms
It is worth emphasizing again that, in addition to the interpretability (feature importance) of machine algorithms such as random forests, the algorithms and comparability of SHAP, LIME, and Morris Sensitivity themselves are highly correlated with their own calculation formulas. They explain the data-driven results through different perspectives. SHAP is based on cooperative game theory, LIME is based on local linear approximation (ridge regression), and Morris Sensitivity is a global parameter perturbation analysis, which is rooted in stability. A critical distinction lies in how different explainability algorithms handle interaction effects due to their distinct mathematical foundations. SHAP explicitly computes all feature subset combinations to capture their collective influence on outcomes through cooperative game theory. In contrast, LIME, constrained by its reliance on ridge regression for local approximations, may require manual incorporation of interaction terms or automated interaction detection. Morris Sensitivity Analysis ignores nonlinear interactions as it solely evaluates global average perturbations through elementary effect measurements. These differences are also reflected in specific indicators: for example, SHAP’s detection of the two-way contribution to W2b02 has a wider range (− 0.6 ~ 0.1) and the data points are distributed in a cone-shaped manner, which to a certain extent reflects the contextual dependence of cognitive characteristics, while Morris’s µ = 0.1333 only captures its average main effect, which to some extent suppresses the expression of irregularly distributed data. The positive effect (0–2) and SHAP value (0.2545) and distribution of W2a12 in LIME better reflect the characteristics of the data, suggesting that the feature may have a threshold effect, while Morris Sensitivity and Feature Importance lack detailed reflection. However, Morris Sensitivity can avoid the risk of extreme data on the results, which is not good at local feature algorithms such as LIME.
Therefore, to reconcile these different interpretable algorithms, we recommend reconciling different indicators. On the one hand, the use and control of these interpretable algorithms should be based on business considerations, such as only for feature analysis, using mediation effect analysis to test whether the feature is different in different explanation algorithms. Technically, another recommended approach is to reconstruct the indicators. For example, Morris sensitivity and SHAP interaction value can be combined to meet the standards and use coefficients to control the strength of different indicators, or modify the loss function of the interpretable method of machine learning itself. For example, in Formula 8, additional constraints on the interpretable algorithm are added to the MSE loss function. This approach ensures the consistency of the two explanation methods in terms of contribution magnitude and direction of action.
This is more in line with the need for “interpretable depth” in educational cognitive research than simple consistency. This design allows the model to maintain predictive performance while its decision-making mechanism can be cross-validated by multiple methods, which is crucial for educational policy makers to understand the complex cognitive influencing factors.
However, these methods are not a panacea. In particular, the conflicting theoretical assumptions of different explanatory frameworks pose substantial challenges. Forcing numerical consistency among different algorithms may lead to the risk of overfitting by models suppressing real nonlinear interactions or overfitting to the computational biases of the explanatory methods. Therefore, keep in mind that in practice, decision traceability often takes precedence over marginal improvements in forecast accuracy.
Limitations
The original deficiencies in the research are mainly the following: Firstly, the use of ML algorithms combined with explainability AI methods for analysis and modeling inherently introduces a degree of variability, leading to feature rankings that are not entirely definitive. Although the Morris Sensitivity index can support the validity of the algorithm to a certain extent, expanding the dataset may further reduce the MSE and enhance the robustness of the conclusions. In addition, although this study provides a complete causal inference method, the establishment of the causal model is data-driven and lacks theoretical support. This study aims to provide an exploratory perspective for studying students’ cognitive abilities; therefore, learning theories can be included to guide the data-driven approach for providing better theoretical support in the future. Due to concerns over student privacy and ethical considerations, this study refrained from conducting deeper GI analyses on certain key factors influencing students. However, it is undoubtedly a crucial channel for interpreting educational data through explainability AI methods.
Future work
A potential direction for future work lies in multimodal approaches. This study used structured research and did not take into account unstructured data, such as audio, text, and other information that directly expresses cognitive abilities. After being processed by some improved AI algorithms, this modal information may provide a richer source of information97,98. While this study relies on questionnaire-based data, the interpretability of structured data is inherently constrained. Incorporating multimodal information offers greater potential for practical advancements99, especially with the rise of generative AI, including conversational chatbots100 and reasoning-capable large models101, automatic diagnostic systems102,
We will consider adding text as a modality to inject new vitality into the interpretability methods of educational data mining. At the same time, we noticed that some previous studies have used multimodal data to predict and analyze educational data103,104. However, these interpretable multimodal data methods have not yet been uniformly coordinated and balanced, which will lead to challenges such as excessive computational overhead when performing data-driven causal inference; therefore, extending the coordination and trade-offs of interpretable models to multimodal data will be one of the focuses of future research.
In summary, this study will aim to carefully evaluate student samples and identify a balanced approach that protects student privacy while enabling GI for educational assessment. It should also be noted that the problem classification of the explainability algorithm is affected by subjective factors. Statistical inference or automatic classification methods based on AI may be considered to be introduced to enhance the accuracy of classification judgment in future research.
Conclusion
This study offers a new perspective on educational data-driven modeling and data mining by utilizing explainability AI algorithms as a tool. From the theoretical standpoint of data-driven approaches, this paper investigates the factors influencing students’ cognitive abilities. A range of traditional AI algorithms, including RF, XGBoost, and SVM, were employed for hyperparameter tuning, modeling, and optimization. Through multiple interpretable models and causal analysis, the results revealed that self-perception and parental expectations play pivotal roles in shaping students’ cognitive abilities. When applying different explainability algorithms to interpret the data-driven model, it was observed that these algorithms exhibit distinct preferences and disparities in feature importance. Global interpretation models provide a more comprehensive consideration of factors, while local explainability algorithms like LIME prioritize features with greater variability at the individual sample level, offering a unique avenue for uncovering hidden insights. Therefore, future research should emphasize the significance of students’ self-perception and family support in enhancing cognitive abilities. When using data-driven models, one should consider using a variety of interpretable models more comprehensively for feature selection and causal inference to provide richer and more complete insights.
Data availability
According to Ethics approval requirements, data can be made available to a limited extent with reasonable justification. The related data are available upon reasonable request from the author, Prof. Tianyue Niu (Email: t24095@fzu.edu.cn), under the condition of complying with ethical review requirements.
References
Dimaggio, P. Culture and cognition. Ann. Rev. Sociol. 23 (1), 263–287. https://doi.org/10.1146/annurev.soc.23.1.263 (1997).
Baker, D. P. et al. The cognitive impact of the education revolution: A possible cause of the Flynn Effect on population IQ, Intelligence, 49, 144–158 (2015). https://doi.org/10.1016/j.intell.2015.01.003
Sharp, D. et al. Education and cognitive development: the evidence from experimental research. Monogr. Soc. Res. Child Dev. 44 (1/2), 1–112. https://doi.org/10.2307/3181586 (1979).
Luo, Y., Pang, P. C. I. & Chang, S. Enhancing exploratory learning through exploratory search with the emergence of large language models, In Proceedings of the 58th Hawaii International Conference on System Sciences (HICSS), pp. 44–53, 7–10 Jan 2025, (2024), ), pp. 44–53, 7–10 Jan 2025, (2024), (2025). https://doi.org/10.24251/HICSS.2025.007
Luo, Y. et al. Enhanced Bloom’s Educational Taxonomy for Fostering Information Literacy in the Era of Large Language Models, arXiv preprint arXiv:2503.19434, (2025).
Heaton, R. Cognition in Art education. Br. Edu. Res. J. 47 (5), 1323–1339 (2021).
Zhang, Q. Application of music education in brain cognition, In Educational Sciences: Theory & Practice, vol. 18 (2018).
Jing, Y. The role of music in enhancing cognitive and emotional development in higher education students: A comparative study. Role Music Enhancing Cogn. Emotional Dev. High. Educ. Students: Comp. Study. 159 (1), 9 (2024).
Rozi, Z. F., Ruyani, A. & Badeni Development of a STEAM-Based homeostatic module to improve students’ cognitive abilities. Jurnal Penelitian Pendidikan IPA. 10 (6), 3099–3107. https://doi.org/10.29303/jppipa.v10i6.7253 (2024).
Lu, G. & Wen, S. A student engagement typology and its relationship to ability development in Chinese universities. Asia-Pacific Educ. Researcher. 33 (1), 93–102. https://doi.org/10.1007/s40299-023-00711-7 (2024).
Zeng, Z., Yao, J., Gu, H. & Przybylski, R. A Meta-Analysis on the effects of STEM education on students’ abilities. Sci. Insights Educ. Front. 1 (1), 3–16. https://doi.org/10.15354/sief.18.re005 (2018).
Sternfeld, G. Z., Israeli, R. & Lapidot-Lefer, N. Creative education or educational creativity: integrating arts, social emotional aspects and creative learning environments. In International J. Educ. & Arts, 25 (2024).
Karr, J. E. & White, A. E. Academic self-efficacy and cognitive strategy use in college students with and without depression or anxiety. J. Am. Coll. Health. 72 (5), 1387–1393. https://doi.org/10.1080/07448481.2022.2076561 (2024).
Alam, A. & Mohanty, A. Framework of Self-Regulated cognitive engagement (FSRCE) for sustainable pedagogy: a model that integrates SRL and cognitive engagement for holistic development of students. Cogent Educ. 11 (1), 2363157. https://doi.org/10.1080/2331186x.2024.2363157 (2024).
Marks, G. N. & O’Connell, M. Student achievement is much more about cognitive ability and genetics than SES: A response to debouwere. Rev. Educ. 12 (2), e3483. https://doi.org/10.1002/rev3.3483 (2024).
Hales, A. H. & Wood, N. R. Statistical Controversies in Psychological Sciencepp. 191–211 (Springer International Publishing, 2022).
Khanna, M. M. & Cortese, M. J. Topics, Methods, and Research-Based Strategies for Teaching Cognition177–200 (Springer International Publishing, 2023).
Buchberger, E. S., Ngo, C. T., Peikert, A., Brandmaier, A. M. & Werkle-Bergner, M. Estimating statistical power for structural equation models in developmental cognitive science: A tutorial in R. Behav. Res. Methods. 56 (7), 1–18. https://doi.org/10.3758/s13428-024-02396-2 (2024).
Liu, T., Luo, Y., Pang, P. C. I. & Xia, Y. Leveraging digital technologies for public health education in High-Density community spaces: A Geospatial analysis. Systems 12 (11), 504. https://doi.org/10.3390/systems12110504 (2024).
Xu, J. & Wang, Y. Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture, arXiv preprint arXiv:2412.11557, (2024).
Xu, J., Jiang, Y., Yuan, B., Li, S. & Song, T. Automated Scoring of Clinical Patient Notes using Advanced NLP and Pseudo Labeling, In 5th International Conference on Artificial Intelligence and Computer Applications (ICAICA), pp. 384–388 (IEEE, 2023)., pp. 384–388 (IEEE, 2023). (2023).
Cao, Q., Luo, Y. & Luo, Y. A gold price time series prediction model based on CEENAM and machine learning, ACM, (2024). https://doi.org/10.1145/3675417.3675449
Li, S. IEEE,. Harnessing multimodal data and mult-recall strategies for enhanced product recommendation in e-commerce, In 2024 4th International Conference on Computer Systems (ICCS), pp. 181–185 (2024).
Lu, J. Optimizing e-commerce with multi-objective recommendations using ensemble learning, In. 4th International Conference on Computer Systems (ICCS), pp. 167–171 (IEEE, 2024). (2024).
Lu, J., Long, Y., Li, X., Shen, Y. & Wang, X. Hybrid model integration of LightGBM, DeepFM, and DIN for enhanced purchase prediction on the Elo dataset, In 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (ICISCAE), pp. 16–20 (IEEE, 2024).
Guan, C., Mou, J. & Jiang, Z. Artificial intelligence innovation in education: A twenty-year data-driven historical analysis. Int. J. Innov. Stud. 4 (4), 134–147. https://doi.org/10.1016/j.ijis.2020.09.001 (2020).
Chen, J., Zhang, Y., Wei, Y. & Hu, J. Discrimination of the contextual features of top performers in scientific literacy using a machine learning approach. Res. Sci. Educ. 51 (S1), 129–158. https://doi.org/10.1007/s11165-019-9835-y (2021).
Yousafzai, B. K., Hayat, M. & Afzal, S. Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student. Educ. Inform. Technol. 25 (6), 4677–4697. https://doi.org/10.1007/s10639-020-10189-1 (2020).
Alam, A. & Mohanty, A. Predicting Students’ Performance Employing Educational Data Mining Techniques, Machine Learning, and Learning Analyticspp. 166–177 (Springer Nature Switzerland, 2023).
Nahar, K., Shova, B. I., Ria, T., Rashid, H. B. & Islam, A. H. M. S. Mining educational data to predict students performance. Educ. Inform. Technol. 26 (5), 6051–6067. https://doi.org/10.1007/s10639-021-10575-3 (2021).
Pereira, F. D. et al. Explaining individual and collective programming students’ behavior by interpreting a Black-Box predictive model. IEEE Access. 9, 117097–117119. https://doi.org/10.1109/access.2021.3105956 (2021).
Belle, V. & Papantonis, I. Principles and practice of explainable machine learning. Front. Big Data. https://doi.org/10.3389/fdata.2021.688969 (2021).
Lu, J. Optimizing e-commerce with multi-objective recommendations using ensemble learning (2024).
Zafar, M. R. & Khan, N. Deterministic local interpretable Model-Agnostic explanations for stable explainability. Mach. Learn. Knowl. Extr. 3 (3), 525–541. https://doi.org/10.3390/make3030027 (2021).
Setzu, M. et al. GLocalX - From local to global explanations of black box AI models. Artif. Intell. 294, 103457. https://doi.org/10.1016/j.artint.2021.103457 (2021).
Kuhn, D. The application of piaget’s theory of cognitive development to education. Harv. Educational Rev. 49 (3), 340–360. https://doi.org/10.17763/haer.49.3.h70173113k7r618r (1979).
By Lee, S. D., Kuncel, N. R. & Gau, J. Personality, attitude, and demographic correlates of academic dishonesty: A meta-analysis. Psychol. Bull. 146 (11), 1042–1058. https://doi.org/10.1037/bul0000300 (2020).
Pigaiani, Y. et al. Adolescent lifestyle behaviors, coping strategies and subjective wellbeing during the COVID-19 pandemic: an online student survey. Healthcare 8 (4), 472. https://doi.org/10.3390/healthcare8040472 (2020).
Brandt, N. D., Lechner, C. M., Tetzner, J. & Rammstedt, B. Personality, cognitive ability, and academic performance: differential associations across school subjects and school tracks. J. Pers. 88 (2), 249–265. https://doi.org/10.1111/jopy.12482 (2020).
Li, Z. & Qiu, Z. How does family background affect children’s educational achievement? Evidence from contemporary China. J. Chin. Sociol. 5 (1), 1–21. https://doi.org/10.1186/s40711-018-0083-8 (2018).
Iqbal, J., Asghar, M. Z., Ashraf, M. A. & Yi, X. The impacts of emotional intelligence on students’ study habits in blended learning environments: the mediating role of cognitive engagement during COVID-19. Behav. Sci. 12 (1), 14. https://doi.org/10.3390/bs12010014 (2022).
Jansen, K. & Kiefer, S. M. Understanding brain development: investing in young adolescents’ cognitive and social-emotional development. Middle School J. 51 (4), 18–25. https://doi.org/10.1080/00940771.2020.1787749 (2020).
Liu, T. et al. The impact of social media on children’s mental health: A systematic scoping review. Healthcare 12 (23), 2391 (2024).
Nja, C. O. et al. The socio-economic rank of parents and students’ academic and cognitive outcomes: examining the physical, psychological and social mediators. Front. Educ. https://doi.org/10.3389/feduc.2022.938078 (2022).
Ganzach, Y. Parents’ education, cognitive ability, educational expectations and educational attainment: interactive effects. Br. J. Educ. Psychol. 70 (3), 419–441 (2000).
Avvisati, F., Besbas, B. & Guyon, N. Parental involvement in school: A literature review. Revue D’économie Politique. 120 (5), 759–778. https://doi.org/10.3917/redp.205.0759 (2011).
Duncan, G. J. & Magnuson, K. Socioeconomic status and cognitive functioning: moving from correlation to causation. Wires Cogn. Sci. 3 (3), 377–386. https://doi.org/10.1002/wcs.1176 (2012).
Korous, K. M., Causadias, J. M., Bradley, R. H., Luthar, S. S. & Levy, R. A systematic overview of Meta-Analyses on socioeconomic status, cognitive ability, and achievement: the need to focus on specific pathways. Psychol. Rep. 125 (1), 55–97. https://doi.org/10.1177/0033294120984127 (2022).
Chan, K. I., Pang, P. C. I. & Wei, W. Exploring the factors and moderators influencing the use of radar visualisation of student performance from parents’ perspective, In IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE), pp. 1–8 (IEEE, 2023)., pp. 1–8 (IEEE, 2023). (2023).
Alam, A. & Mohanty, A. Does Musically Responsive School Curriculum Enhance Reasoning Abilities and Helps in Cognitive Development of School Students?pp. 337–341 (CRC, 2023).
Tenenbaum, H. R., Winstone, N. E., Leman, P. J. & Avery, R. E. How effective is peer interaction in facilitating learning? A meta-analysis. J. Educ. Psychol. 112 (7), 1303–1319. https://doi.org/10.1037/edu0000436 (2020).
Braun, T. & Oertzen, T. Empathy, cognitive functioning, and prosocial behavior in mentored children. New Dir. Child Adolesc. Dev. 20021 (179), 41–57. https://doi.org/10.1002/cad.20409 (2021).
Sharma, S. & Gupta, B. Investigating the role of technostress, cognitive appraisal and coping strategies on students’ learning performance in higher education: a multidimensional transactional theory of stress approach. Inform. Technol. People. 36 (2), 626–660. https://doi.org/10.1108/itp-06-2021-0505 (2023).
Hwang, J. & Ham, Y. Relationships between self-efficacy and achievement moderated by teacher interaction: an international comparison study. Math. Educ. Res. J. 33 (1), 135–162. https://doi.org/10.1007/s13394-019-00280-3 (2021).
Yang, L., Zhang, L. J. & Dixon, H. R. Understanding the impact of teacher feedback on EFL students’ use of self-regulated writing strategies. J. Second Lang. Writ. 60, 101015. https://doi.org/10.1016/j.jslw.2023.101015 (2023).
Ventura, A. C. & Carreño, F. How do teachers metacognitive talk about math and writing support first-graders metacognition and self-regulation in everyday classrooms? Eur. J. Child. Dev. Educ. Psychopathol. https://doi.org/10.32457/ejpad.v12i1.2617 (2024).
Dr, S., Sabihee, K., Dr & Madiha Dr. Shahid, exploring the role of teachers in motivating underprivileged students for their educational and career aspiration. Al-Mahdi Res. J. (MRJ). 5 (3), 1183–1200 (2024).
Shoaib, M. et al. AI student success predictor: enhancing personalized learning in campus management systems. Comput. Hum. Behav. 158, 108301 (2024).
Guleria, P. & Sood, M. Explainable AI and machine learning: performance evaluation and explainability of classifiers on educational data mining inspired career counseling. Educ. Inform. Technol. 28 (1), 1081–1116 (2023).
Islam, M. R. et al. Enhancing tertiary students’ programming skills with an explainable educational data mining approach. PloS One. 19 (9), e0307536 (2024).
Gunasekara, S. & Saarela, M. Explainability in educational data mining and learning analytics: An umbrella review, In Proceedings of the 17th International Conference on Educational Data Mining, pp. 887–892 (2024).
Gunasekara, S. & Saarela, M. Explainable AI in education: techniques and qualitative assessment. Appl Sci (2025).
Luo, Y. Identifying factors influencing China junior high students’ cognitive ability through educational data mining: Utilizing LASSO, random forest, and XGBoost, In Presented at the 4th International Conference on Modern Education and Information Management, Wuhan, China (2023).
Zhang, L. Parent-Child expectation discrepancy and adolescent mental health: evidence from China education panel survey. Child Indic. Res. 17 (2), 705–730. https://doi.org/10.1007/s12187-023-10094-2 (2024).
Yang, M., Tam, T. & Huang, M. H. Psychometric report for the ability tests of TEPS 2001, Taipei: Center for Survey Research, Academia Sinica. Taipei: Center for Survey Research, Academia Sinica, (2003).
Luo, Y., Wang, Z. & Feature mining algorithm for student academic prediction based on interpretable deep neural network,IEEE, Doi: 10.1109/iciet60671.2024.10542709. [Online]. (2024). Available: https://doi.org/10.1109/ICIET60671.2024.10542709
Cohausz, L. T., Christian, B. & Heiner, A. S. Investigating the importance of demographic features for EDM-predictions, In The 16th International Conference on Educational Data Mining, Bengaluru, Jul 11–14 (2023).
Ranstam, J. & Cook, J. A. LASSO regression. Br. J. Surg. 105 (10), 1348–1348. https://doi.org/10.1002/bjs.10895 (2018).
Li, Y. & Zhu, J. Norm quantile regression. J. Comput. Graphical Stat. 17 (1), 163–185. https://doi.org/10.1198/106186008x289155 (2008).
Breiman, L. Random forests. Mach. Learn. 45 (1), 5–32. https://doi.org/10.1023/a:1010933404324 (2001).
Chen, T. & Guestrin, C. XGBoost ACM, 11, 785–794, doi: https://doi.org/10.1145/2939672.2939785 (2016).
Chandra, M. A. & Bedi, S. S. Survey on SVM and their application in image classification. Int. J. Inform. Technol. 13 (5), 1–11. https://doi.org/10.1007/s41870-017-0080-1 (2021).
Ozaki, Y., Tanigaki, Y., Watanabe, S., Nomura, M. & Onishi, M. Multiobjective Tree-Structured Parzen estimator. J. Artif. Intell. Res. 73, 1209–1250. https://doi.org/10.1613/jair.1.13188 (2022).
Putatunda, S. & Rama, K. A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost, (ACM, 2018). Doi: 10.1145/3297067.3297080. [Online]. Available: https://doi.org/10.1145/3297067.3297080
Van Den Broeck, G., Lykov, A., Schleich, M. & Suciu, D. On the tractability of SHAP explanations. J. Artif. Intell. Res. 74, 851–886. https://doi.org/10.1613/jair.1.13283 (2022).
Luo, Y., Zhang, R., Wang, F. & Wei, T. Customer segment classification prediction in the Australian retail based on machine learning algorithmsACM, Doi: 10.1145/3650215.3650302. [Online]. (2023). Available: https://doi.org/10.1145/3650215.3650302
Ribeiro, M. T., Singh, S., Guestrin, C. & Why Should I Trust You? pp. 1135–1144ACM, Doi: 10.1145/2939672.2939778. [Online]. (2016). Available: https://doi.org/10.1145/2939672.2939778
King, D. M. & Perera, B. J. C. Morris method of sensitivity analysis applied to assess the importance of input variables on urban water supply yield – A case study. J. Hydrol. 477, 17–32. https://doi.org/10.1016/j.jhydrol.2012.10.017 (2013).
Caliendo, M. & Kopeinig, S. Some practical guidance for the implementation of propensity score matching. J. Economic Surv. 22 (1), 31–72 (2008).
Keele, L. An overview of rbounds: an R package for Rosenbaum bounds sensitivity analysis with matched data. White Paper Columb. OH. 1, 15 (2010).
Austin, P. C. A tutorial and case study in propensity score analysis: an application to estimating the effect of in-hospital smoking cessation counseling on mortality. Multivar. Behav. Res. 46 (1), 119–151 (2011).
Mahareek, E. A., Desuky, A. S. & El-Zhni, H. A. Simulated annealing for SVM parameters optimization in student’s performance prediction. Bull. Electr. Eng. Inf. 10 (3), 1211–1219. https://doi.org/10.11591/eei.v10i3.2855 (2021).
Jung, K. R., Zhou, A. Q. & Lee, R. M. Self-efficacy, self-discipline and academic performance: testing a context-specific mediation model. Learn. Individual Differences. 60, 33–39. https://doi.org/10.1016/j.lindif.2017.10.004 (2017).
Chemers, M. M., Hu, L. & Garcia, B. F. Academic self-efficacy and first year college student performance and adjustment. J. Educ. Psychol. 93 (1), 55 (2001).
Horcajo, J., Santos, D. & Higuero, G. The effects of self-efficacy on physical and cognitive performance: an analysis of meta-certainty. Psychol. Sport Exerc. 58, 102063 (2022).
Usher, E. L. & Pajares, F. Sources of self-efficacy in mathematics: A validation study. Contemporary Educational Psychology, 34(1), 89–101 (2009), https://doi.org/10.1016/j.cedpsych.2008.09.002
Jaafar, W. M. W. & Ayub, A. F. M. Mathematics Self-efficacy and Meta-Cognition among university students. Procedia - Social Behav. Sci. 8, 519–524. https://doi.org/10.1016/j.sbspro.2010.12.071 (2010).
Jonsson, B., Granberg, C. & Lithner, J. Gaining mathematical understanding: the effects of creative mathematical reasoning and cognitive proficiency. Front. Psychol. 11, 574366 (2020).
Morris, J. E. Arts engagement outside of school: links with year 10 to 12 students’ intrinsic motivation and self-efficacy in responding to Art. Australian Educational Researcher. 45 (4), 455–472 (2018).
Walker, C. M., Winner, E., Hetland, L., Simmons, S. & Goldsmith, L. Visual thinking: Art students have an advantage in geometric reasoning. Creative Educ. 2 (01), 22 (2011).
Eccles, J. Expectancies, values, and academic behaviors, In Achievement and achievement motivation/Freeman (1983).
Phillipson, S. & Phillipson, S. N. Children’s cognitive ability and their academic achievement: the mediation effects of parental expectations. Asia Pac. Educ. Rev. 13, 495–508 (2012).
Kacane, I. IATED,. Family expectations and visions for their children’s future: The role of developing cultural literacy, In EDULEARN20 Proceedings, pp. 1394–1400 (2020).
Compas, B. E., Jaser, S. S., Reeslund, K., Patel, N. & Yarboi, J. Neurocognitive deficits in children with chronic health conditions. Am. Psychol. 72 (4), 326 (2017).
AlShorman, O., Masadeh, M., Alzyoud, A., Heyat, M. B. B. & Akhtar, F. The effects of emotional stress on learning and memory cognitive functions: an EEG review study in education, In 2020 Sixth International Conference on e-Learning (econf), pp. 177–182 (IEEE, 2020).
Read, D. A. Health Education: A cognitive-behavioral Approach (Jones & Bartlett Learning, 1997).
Li, S., Zhou, X., Wu, Z., Long, Y. & Shen, Y. Strategic deductive reasoning in large language models: A dual-agent approach, In 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS), pp. 834–839 (IEEE, 2024).
Dai, W. et al. CAB-KWS: Contrastive augmentation: An unsupervised learning approach for keyword spotting in speech technology, In International Conference on Pattern Recognition, pp. 98–112Springer, (2025).
Li, S. Harnessing Multimodal Data and Mult-Recall Strategies for Enhanced Product Recommendation in E-Commerce (2024).
Lu, J. Enhancing Chatbot user satisfaction: a machine learning approach integrating decision tree, TF-IDF, and BERTopic (2024).
Li, S., Zhou, X., Wu, Z., Long, Y. & Shen, Y. Strategic deductive reasoning in large language models: A dual-agent approach. (2024).
Xu, J., Jiang, Y., Yuan, B., Li, S. & Song, T. Automated Scoring of Clinical Patient Notes Using Advanced NLP and Pseudo LabelingIEEE, Doi: 10.1109/icaica58456.2023.10405427. [Online]. (2023). Available: https://doi.org/10.1109/ICAICA58456.2023.10405427
Rahman, M. A., Brown, D. J., Shopland, N., Burton, A. & Mahmud, M. Explainable multimodal machine learning for engagement analysis by continuous performance test, In International Conference on Human-Computer Interaction, pp. 386–399Springer, (2022).
Sharma, K., Papamitsiou, Z. & Giannakos, M. Building pipelines for educational data using AI and multimodal analytics: A grey-box approach. Br. J. Edu. Technol. 50 (6), 3004–3031 (2019).
Funding
This research was funded by the Macao Science and Technology Development Fund (funding ID: 0032/2025/ITP1) and Macao Polytechnic University research grant (project code: RP/FCA-10/2022).
Author information
Authors and Affiliations
Contributions
T.N conceptualized this study. T.N, T.L. and Y.L. proposed and designed this study. S.H. helped with data analyses and resolved disagreements in the review process. T.N. wrote the manuscript with the assistance of A.X. P.P. supervised and provided resources for this work. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the academic ethics committee of Fuzhou University, China (protocol code ERBA-0803373644). Before data collection, we distributed paper-based informed consent forms to all participants and obtained their signed agreements. These documents explicitly outlined the research purpose, data usage scope, and anonymization procedures. All analyses in this study will be conducted exclusively on de-identified datasets, ensuring no recoverable personal information remains in any research outputs. After the study is completed and first published publicly, the relevant original materials will be kept by the author, Prof. Tianyue Niu, for six months to facilitate potential ethical audits, after which they will be subject to certified document destruction procedures by the ethical review body’s record retention policy.
Consent for publication
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Niu, T., Liu, T., Luo, Y.T. et al. Decoding student cognitive abilities: a comparative study of explainable AI algorithms in educational data mining. Sci Rep 15, 26862 (2025). https://doi.org/10.1038/s41598-025-12514-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-12514-5