Abstract
New technologies in education have created a huge amount of data that, when used effectively, can have a major impact on the functioning of an institution and the academic achievement of students. Nevertheless, all existing predictive models are still disconnected and do not integrate historical trends, student-faculty relationships, and trend patterns into a coherent decision-making system. The paper describes an integrated machine learning system that integrates several synergistic AI technologies: (1) deep learning systems (LSTM, GRU, CNN, and Transformers) to model academic growth over time; (2) comprehensible gradient boosting ensembles (XGBoost, LightGBM, and CatBoost) to understandably infer and analyze structured data. (3) graph convolutional networks (GCNs) to encode academic relationships between students, professors, and courses; and (4) data-centric oriented approaches (multitasking, transfer, and federated learning). The framework is tested on two UCI benchmark datasets (nā=ā649) with fully isolated holdout sets using strict nested cross-validation to prevent data leakage. The framework yields 99.6% and 97.5% predictive accuracy (5.6% and 6.3% improvement over the top baselines) and high recall (99.4% and 96.7%) in classifying at-risk students. Each component has been shown to contribute fully in ablation studies, and the hybrid framework has been shown to outperform state-of-the-art transformed table models (TabTransformer, FT-Transformer, and SAINT) (99.6% vs. 97.2% for the best transformer). Robustness analysis with feature noise and missing data (>ā96% accuracy with 20% missing data) demonstrates excellent regression. Fairness assessment indicates that gender and age bias are very small, and mitigation strategies (reweighting, adversarial debiasing) bring the parental education gap down to 0.1%. Cross-domain experiments (mathematics/Portuguese) show a performance loss of -2.3%, indicating internal generalizability, but cross-institutional validation remains to be performed. This framework provides educators with interpretable, actionable insights into evidence-based interventions, demonstrating that for accurate, fair, and robust predictive educational analytics, multi-paradigm AI integration is essential and comprehensive.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Ationu, H. Predicting student performance using machine learning: a data-driven approach with consideration of special needs students. no April. https://doi.org/10.13140/RG.2.2.32112.98569 (2025).
Hakkal, S. & Lahcen, A. A. XGBoost To Enhance Learner Performance Prediction. Comput. Educ. Artif. Intell. 7, 100254. https://doi.org/10.1016/j.caeai.2024.100254 (2024).
Altabrawee, H., Ali, O. A. J. & Ajmi, S. Q. Predicting Studentsā Performance Using Machine Learning Techniques. J. Univ. BABYLON. Pure Appl. Sci. 27 (1), 194ā205. https://doi.org/10.29196/jubpas.v27i1.2108 (2023).
Adefemi, K. O., Mutanga, M. B. & Jugoo, V. Hybrid Deep Learning Models for Predicting Student Academic Performance. Math. Comput. Appl. 30 (3), 10ā20. https://doi.org/10.3390/mca30030059 (2025).
Alnasyan, B., Basheri, M. & Alassafi, M. The power of Deep Learning techniques for predicting student performance in Virtual Learning Environments: A systematic literature review. Comput. Educ. Artif. Intell. 6, 100231. https://doi.org/10.1016/j.caeai.2024.100231 (2024).
Liu, Y. et al. Predicting Student Performance Using Clickstream Data and Machine Learning. Educ. Sci. 13 (1). https://doi.org/10.3390/educsci13010017 (2023).
Arya, M. et al. A CNN-LSTM-based deep learning model for early prediction of studentās performance. Int. J. Smart Sens. Intell. Syst. 17 (1), 1ā10. https://doi.org/10.2478/ijssis-2024-0036 (2024).
Abatal, A. et al. A Comprehensive Evaluation of Machine Learning Techniques for Forecasting Student Academic Success. J. Electron. Electromed Eng. Med. Inf. 7 (1), 1ā12. https://doi.org/10.35882/jeeemi.v7i1.489 (2025).
Tang, L. Comparison the Performances for Distributed Machine Learning: Evidence from XGboost and DNN. Appl. Comput. Eng. 103 (1), 209ā215. https://doi.org/10.54254/2755-2721/103/20241196 (2024).
Jiang, W. Deep Learning-Based Prediction of Student Performance in Physics Education Using Multimodal Data. Proc. 2025 Int. Conf. Big Data Informatiz Educ. ICBDIE 2025. 119ā124. https://doi.org/10.1145/3729605.3729627 (2025).
Gurcan, F. Enhancing breast cancer prediction through stacking ensemble and deep learning integration. PeerJ Comput. Sci. 11 https://doi.org/10.7717/PEERJ-CS.2461 (2025).
Laribi, N. et al. Ensemble deep learning of CNN vs vision transformers for brain lesion classification on MRI images, CEUR Workshop Proc., vol. 3892, pp. 203ā219, (2024).
Sudhamathy, G. & Valliammal, N. The Bayesian CNN-LSTM classification model to predict and evaluate learnerās performance. Int. J. Appl. Sci. Eng. 20 (4). https://doi.org/10.6703/IJASE.202312_20(4).007 (2023).
Elrahman, A. A., Soliman, T. H. A., Taloba, A. I. & Farghally, M. F. A Predictive Model for Student Performance in Classrooms using Student Interactions with an eTextbook. Inf. Sci. Lett. 12 (1), 9ā12. https://doi.org/10.18576/isl/120102 (2023).
Wu, X., Yu, Z., Zhang, C. & Zhiheng, Z. Research on MOOC dropout prediction by combining CNN-BiGRU and GCN, vol. 13486, no. Cvaa p. 109, 2025, (2024). https://doi.org/10.1117/12.3055872
Kumar, P. Predictive modeling for injury prevention in athletes using artificial intelligence. Int. J. Physiol. Sport Phys. Educ. 6 (2), 17ā20. https://doi.org/10.33545/26647710.2024.v6.i2a.76 (2024).
Sun, Q. et al. Machine learning-based assessment of diabetes risk: Machine learning-based assessment of diabetes risk: Q. Appl. Intell., 55, 2, 1ā13, doi: https://doi.org/10.1007/s10489-024-05912-1. (2025).
Selvaraj, J., Jerith, G. G., Karthikeyan, R. & Senthil, K. EAI Endorsed Transactions Assessment of CatBoost for Diabetes Prevention in Comparison to XGBoost: AI model capable of predicting the onset of diabetes, 11, pp. 1ā8, https://doi.org/10.4108/eetiot.5880
Chella, A., Pirrone, R., Sorbello, R. & Jóhannsdóttir, K. R. Advances in Digital Science. Adv. Intell. Syst. Comput. 1352, no. https://doi.org/10.1007/978-3-030-71782-7 (March, 2024).
Nadar, N. Enhancing student performance prediction through stream analysis dataset using modified XGBoost algorithm. Int. J. Inf. Technol. Secur. 15 (2), 75ā86. https://doi.org/10.59035/knug1085 (2023).
Herath, D., Dinuwan, C., Ihalagedara, C. & Ambegoda, T. Enhancing Educational Outcomes Through AI Powered Learning Strategy Recommendation System. Int. J. Adv. Comput. Sci. Appl. 15 (10), 739ā748. https://doi.org/10.14569/IJACSA.2024.0151075 (2024).
Borna, M. R., Saadat, H., Hojjati, A. T. & Akbari, E. Analyzing click data with AI: implications for student performance prediction and learning assessment. Front. Educ. 9, no. https://doi.org/10.3389/feduc.2024.1421479 (December, 2024).
Balayet Hossain, M. et al. Enhancing Medicare Fraud Detection With a CNN-Transformer-XGBoost Framework and Explainable AI. IEEE Access. 13, 79609ā79622. https://doi.org/10.1109/ACCESS.2025.3562577 (2025).
Dritsas, E. & Trigka, M. Application of Deep Learning for Heart Attack Prediction with Explainable Artificial Intelligence. Computers 13 (10). https://doi.org/10.3390/computers13100244 (2024).
Silva., A. P. C. and Using Data Mining to Predict Secondary School Student Performance, Proc. 5th Futur. Bus. Technol. Conf., no. 978-9077381-39ā7, pp. 5ā12, [Online]. Available: UCI Repository - Student Performance Data (2008).
Hasan, R., Palaniappan, S., Mahmood, S., Abbas, A. & Sarker, K. U. Dataset of studentsā performance using student information system, moodle and the mobile application āedifyā. Data 6 (11), 1ā10. https://doi.org/10.3390/data6110110 (2021).
Li, X. & Li, S. Transformer Help CNN See Better: A Lightweight Hybrid Apple Disease Identification Model Based on Transformers, (2022).
Khoshkroodi, A., Parvini Sani, H. & Aajami, M. Stacking Ensemble-Based Machine Learning Model for Predicting Deterioration Components of Steel W-Section Beams. Buildings 14 (1). https://doi.org/10.3390/buildings14010240 (2024).
Emima, D. I. G. A. A. Integrative Ensemble Learning Algorithm for Predicting Students ā, pp. 72ā84, (2025).
Wang, Z. et al. Model for prediction of oxygen required in BOF steelmaking. Ironmak. Steelmak. 39 (3), 228ā233. https://doi.org/10.1179/1743281211Y.0000000085 (2023).
Oyucu, S., Ersƶz, B., SaÄıroÄlu, Å., Aksƶz, A. & BiƧer, E. Optimizing Lithium-Ion Battery Performance: Integrating Machine Learning and Explainable AI for Enhanced Energy Management. Sustain 16 (11). https://doi.org/10.3390/su16114755 (2024).
Kipf, T. N. & Welling, M. SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS. 3 , pp. 1ā14, (2022).
Cao, L., Shen, Z. & Xu, S. Efficient forest fire detection based on an improved YOLO model. Vis. Intell. 2 (1). https://doi.org/10.1007/s44267-024-00053-y (2024).
Radford, A. et al. Learning Transferable Visual Models From Natural Language Supervision. Proc. Mach. Learn. Res. 139, 8748ā8763 (2021).
Brendan McMahan, H. et al. y Communication-efficient learning of deep networks from decentralized data, Proc. 20th Int. Conf. Artif. Intell. Stat. AISTATS 2017, 54, (2023).
Zafar, W. et al. Enhanced TumorNet: Leveraging YOLOv8s and U-Net for Superior Brain Tumor. Results Eng. 102994. https://doi.org/10.1016/j.rineng.2024.102994 (2024).
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
M.Y conceived the study, designed the methodology, and supervised the overall research. Z.L performed the experiments, data analysis, and visualization. S.L contributed to data collection, literature review, and result validation. M.Y drafted the initial manuscript, and all authors reviewed, edited, and approved the final version of the manuscript for publication.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisherās note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleās Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleās Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, M., Li, Z. & Liu, S. A machine learning based framework for predictive school management using student and faculty analytics. Sci Rep (2026). https://doi.org/10.1038/s41598-026-47278-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-47278-z