A machine learning based framework for predictive school management using student and faculty analytics

Yang, Ming; Li, Zhe; Liu, Shaoyan

doi:10.1038/s41598-026-47278-z

Download PDF

Article
Open access
Published: 04 April 2026

A machine learning based framework for predictive school management using student and faculty analytics

Ming Yang¹,
Zhe Li² &
Shaoyan Liu³

Scientific Reports , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

New technologies in education have created a huge amount of data that, when used effectively, can have a major impact on the functioning of an institution and the academic achievement of students. Nevertheless, all existing predictive models are still disconnected and do not integrate historical trends, student-faculty relationships, and trend patterns into a coherent decision-making system. The paper describes an integrated machine learning system that integrates several synergistic AI technologies: (1) deep learning systems (LSTM, GRU, CNN, and Transformers) to model academic growth over time; (2) comprehensible gradient boosting ensembles (XGBoost, LightGBM, and CatBoost) to understandably infer and analyze structured data. (3) graph convolutional networks (GCNs) to encode academic relationships between students, professors, and courses; and (4) data-centric oriented approaches (multitasking, transfer, and federated learning). The framework is tested on two UCI benchmark datasets (n = 649) with fully isolated holdout sets using strict nested cross-validation to prevent data leakage. The framework yields 99.6% and 97.5% predictive accuracy (5.6% and 6.3% improvement over the top baselines) and high recall (99.4% and 96.7%) in classifying at-risk students. Each component has been shown to contribute fully in ablation studies, and the hybrid framework has been shown to outperform state-of-the-art transformed table models (TabTransformer, FT-Transformer, and SAINT) (99.6% vs. 97.2% for the best transformer). Robustness analysis with feature noise and missing data (> 96% accuracy with 20% missing data) demonstrates excellent regression. Fairness assessment indicates that gender and age bias are very small, and mitigation strategies (reweighting, adversarial debiasing) bring the parental education gap down to 0.1%. Cross-domain experiments (mathematics/Portuguese) show a performance loss of -2.3%, indicating internal generalizability, but cross-institutional validation remains to be performed. This framework provides educators with interpretable, actionable insights into evidence-based interventions, demonstrating that for accurate, fair, and robust predictive educational analytics, multi-paradigm AI integration is essential and comprehensive.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Ationu, H. Predicting student performance using machine learning: a data-driven approach with consideration of special needs students. no April. https://doi.org/10.13140/RG.2.2.32112.98569 (2025).
Google Scholar
Hakkal, S. & Lahcen, A. A. XGBoost To Enhance Learner Performance Prediction. Comput. Educ. Artif. Intell. 7, 100254. https://doi.org/10.1016/j.caeai.2024.100254 (2024).
Google Scholar
Altabrawee, H., Ali, O. A. J. & Ajmi, S. Q. Predicting Students’ Performance Using Machine Learning Techniques. J. Univ. BABYLON. Pure Appl. Sci. 27 (1), 194–205. https://doi.org/10.29196/jubpas.v27i1.2108 (2023).
Google Scholar
Adefemi, K. O., Mutanga, M. B. & Jugoo, V. Hybrid Deep Learning Models for Predicting Student Academic Performance. Math. Comput. Appl. 30 (3), 10–20. https://doi.org/10.3390/mca30030059 (2025).
Google Scholar
Alnasyan, B., Basheri, M. & Alassafi, M. The power of Deep Learning techniques for predicting student performance in Virtual Learning Environments: A systematic literature review. Comput. Educ. Artif. Intell. 6, 100231. https://doi.org/10.1016/j.caeai.2024.100231 (2024).
Google Scholar
Liu, Y. et al. Predicting Student Performance Using Clickstream Data and Machine Learning. Educ. Sci. 13 (1). https://doi.org/10.3390/educsci13010017 (2023).
Arya, M. et al. A CNN-LSTM-based deep learning model for early prediction of student’s performance. Int. J. Smart Sens. Intell. Syst. 17 (1), 1–10. https://doi.org/10.2478/ijssis-2024-0036 (2024).
Google Scholar
Abatal, A. et al. A Comprehensive Evaluation of Machine Learning Techniques for Forecasting Student Academic Success. J. Electron. Electromed Eng. Med. Inf. 7 (1), 1–12. https://doi.org/10.35882/jeeemi.v7i1.489 (2025).
Google Scholar
Tang, L. Comparison the Performances for Distributed Machine Learning: Evidence from XGboost and DNN. Appl. Comput. Eng. 103 (1), 209–215. https://doi.org/10.54254/2755-2721/103/20241196 (2024).
Google Scholar
Jiang, W. Deep Learning-Based Prediction of Student Performance in Physics Education Using Multimodal Data. Proc. 2025 Int. Conf. Big Data Informatiz Educ. ICBDIE 2025. 119–124. https://doi.org/10.1145/3729605.3729627 (2025).
Gurcan, F. Enhancing breast cancer prediction through stacking ensemble and deep learning integration. PeerJ Comput. Sci. 11 https://doi.org/10.7717/PEERJ-CS.2461 (2025).
Laribi, N. et al. Ensemble deep learning of CNN vs vision transformers for brain lesion classification on MRI images, CEUR Workshop Proc., vol. 3892, pp. 203–219, (2024).
Sudhamathy, G. & Valliammal, N. The Bayesian CNN-LSTM classification model to predict and evaluate learner’s performance. Int. J. Appl. Sci. Eng. 20 (4). https://doi.org/10.6703/IJASE.202312_20(4).007 (2023).
Elrahman, A. A., Soliman, T. H. A., Taloba, A. I. & Farghally, M. F. A Predictive Model for Student Performance in Classrooms using Student Interactions with an eTextbook. Inf. Sci. Lett. 12 (1), 9–12. https://doi.org/10.18576/isl/120102 (2023).
Google Scholar
Wu, X., Yu, Z., Zhang, C. & Zhiheng, Z. Research on MOOC dropout prediction by combining CNN-BiGRU and GCN, vol. 13486, no. Cvaa p. 109, 2025, (2024). https://doi.org/10.1117/12.3055872
Kumar, P. Predictive modeling for injury prevention in athletes using artificial intelligence. Int. J. Physiol. Sport Phys. Educ. 6 (2), 17–20. https://doi.org/10.33545/26647710.2024.v6.i2a.76 (2024).
Google Scholar
Sun, Q. et al. Machine learning-based assessment of diabetes risk: Machine learning-based assessment of diabetes risk: Q. Appl. Intell., 55, 2, 1–13, doi: https://doi.org/10.1007/s10489-024-05912-1. (2025).
Google Scholar
Selvaraj, J., Jerith, G. G., Karthikeyan, R. & Senthil, K. EAI Endorsed Transactions Assessment of CatBoost for Diabetes Prevention in Comparison to XGBoost: AI model capable of predicting the onset of diabetes, 11, pp. 1–8, https://doi.org/10.4108/eetiot.5880
Chella, A., Pirrone, R., Sorbello, R. & Jóhannsdóttir, K. R. Advances in Digital Science. Adv. Intell. Syst. Comput. 1352, no. https://doi.org/10.1007/978-3-030-71782-7 (March, 2024).
Nadar, N. Enhancing student performance prediction through stream analysis dataset using modified XGBoost algorithm. Int. J. Inf. Technol. Secur. 15 (2), 75–86. https://doi.org/10.59035/knug1085 (2023).
Google Scholar
Herath, D., Dinuwan, C., Ihalagedara, C. & Ambegoda, T. Enhancing Educational Outcomes Through AI Powered Learning Strategy Recommendation System. Int. J. Adv. Comput. Sci. Appl. 15 (10), 739–748. https://doi.org/10.14569/IJACSA.2024.0151075 (2024).
Google Scholar
Borna, M. R., Saadat, H., Hojjati, A. T. & Akbari, E. Analyzing click data with AI: implications for student performance prediction and learning assessment. Front. Educ. 9, no. https://doi.org/10.3389/feduc.2024.1421479 (December, 2024).
Balayet Hossain, M. et al. Enhancing Medicare Fraud Detection With a CNN-Transformer-XGBoost Framework and Explainable AI. IEEE Access. 13, 79609–79622. https://doi.org/10.1109/ACCESS.2025.3562577 (2025).
Google Scholar
Dritsas, E. & Trigka, M. Application of Deep Learning for Heart Attack Prediction with Explainable Artificial Intelligence. Computers 13 (10). https://doi.org/10.3390/computers13100244 (2024).
Silva., A. P. C. and Using Data Mining to Predict Secondary School Student Performance, Proc. 5th Futur. Bus. Technol. Conf., no. 978-9077381-39–7, pp. 5–12, [Online]. Available: UCI Repository - Student Performance Data (2008).
Hasan, R., Palaniappan, S., Mahmood, S., Abbas, A. & Sarker, K. U. Dataset of students’ performance using student information system, moodle and the mobile application ‘edify’. Data 6 (11), 1–10. https://doi.org/10.3390/data6110110 (2021).
Google Scholar
Li, X. & Li, S. Transformer Help CNN See Better: A Lightweight Hybrid Apple Disease Identification Model Based on Transformers, (2022).
Khoshkroodi, A., Parvini Sani, H. & Aajami, M. Stacking Ensemble-Based Machine Learning Model for Predicting Deterioration Components of Steel W-Section Beams. Buildings 14 (1). https://doi.org/10.3390/buildings14010240 (2024).
Emima, D. I. G. A. A. Integrative Ensemble Learning Algorithm for Predicting Students ’, pp. 72–84, (2025).
Wang, Z. et al. Model for prediction of oxygen required in BOF steelmaking. Ironmak. Steelmak. 39 (3), 228–233. https://doi.org/10.1179/1743281211Y.0000000085 (2023).
Google Scholar
Oyucu, S., Ersöz, B., Sağıroğlu, Ş., Aksöz, A. & Biçer, E. Optimizing Lithium-Ion Battery Performance: Integrating Machine Learning and Explainable AI for Enhanced Energy Management. Sustain 16 (11). https://doi.org/10.3390/su16114755 (2024).
Kipf, T. N. & Welling, M. SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS. 3 , pp. 1–14, (2022).
Cao, L., Shen, Z. & Xu, S. Efficient forest fire detection based on an improved YOLO model. Vis. Intell. 2 (1). https://doi.org/10.1007/s44267-024-00053-y (2024).
Radford, A. et al. Learning Transferable Visual Models From Natural Language Supervision. Proc. Mach. Learn. Res. 139, 8748–8763 (2021).
Google Scholar
Brendan McMahan, H. et al. y Communication-efficient learning of deep networks from decentralized data, Proc. 20th Int. Conf. Artif. Intell. Stat. AISTATS 2017, 54, (2023).
Zafar, W. et al. Enhanced TumorNet: Leveraging YOLOv8s and U-Net for Superior Brain Tumor. Results Eng. 102994. https://doi.org/10.1016/j.rineng.2024.102994 (2024).

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Science and Education, Aviation General Hospital, No. 3, Beiyuan Road, Chaoyang District, Beijing, 100012, China
Ming Yang
Discipline Inspection Office, Capital Medical University Affiliated Beijing Hospital of Traditional Chinese, Beijing, 100010, China
Zhe Li
Department of Educational Teaching Quality Construction, Chinese Research Academy of Traditional Chinese Medicine College, Beijing, 100700, China
Shaoyan Liu

Authors

Ming Yang
View author publications
Search author on:PubMed Google Scholar
Zhe Li
View author publications
Search author on:PubMed Google Scholar
Shaoyan Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

M.Y conceived the study, designed the methodology, and supervised the overall research. Z.L performed the experiments, data analysis, and visualization. S.L contributed to data collection, literature review, and result validation. M.Y drafted the initial manuscript, and all authors reviewed, edited, and approved the final version of the manuscript for publication.

Corresponding author

Correspondence to Ming Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, M., Li, Z. & Liu, S. A machine learning based framework for predictive school management using student and faculty analytics. Sci Rep (2026). https://doi.org/10.1038/s41598-026-47278-z

Download citation

Received: 31 October 2025
Accepted: 31 March 2026
Published: 04 April 2026
DOI: https://doi.org/10.1038/s41598-026-47278-z

A machine learning based framework for predictive school management using student and faculty analytics

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links