Abstract
The early and accurate identification of playing position-specific skills in young footballers is of critical importance for both performance development and long-term player planning. In this context, the evaluation of quantitative data obtained from technical tests using analytical methods provides a more objective approach that supports the coach’s intuition. This study aims to predict the playing positions of young footballers by using data obtained from anthropometric and technical performance tests with machine learning (ML) algorithms. This study involved 200 male footballers aged 15–17 who played in different positions (defence = 66, midfield = 67, forward = 67) and were recorded according to the primary tactical role assigned to them by the coach. The participants’ football-specific technical skills (ball control, shooting, dribbling) and anthropometric characteristics (height, weight, age, BMI) were recorded. Their technical and anthropometric characteristics were compared according to their playing positions using an ANOVA test, and the Bonferroni post-hoc test was performed to test for differences between groups. After data pre-processing and standardisation, the model created with the obtained technical and anthropometric parameters was analysed using Support Vector Machines (SVM, RBF kernel), K-Nearest Neighbour (KNN), Logistic Regression (LR) and Gaussian Naive Bayes algorithms. Model performances were compared based on accuracy, precision, sensitivity, and macro F1 scores. ROC curves and confusion matrices were analysed for the model with the highest performance. Furthermore, the technical and anthropometric parameters affecting the highest performance were analysed using the permutation importance method. The results of the one-way ANOVA showed significant differences between playing positions in terms of age, height, BMI, heading, and dribbling performance (p < 0.05). Post-hoc analyses revealed that midfielders were older than defenders. Forwards, on the other hand, were both taller and had lower BMI values. Furthermore, forwards demonstrated higher heading performance and achieved better results in dribbling skills compared to defenders. Among the ML models, the highest classification success was achieved with the SVM (RBF kernel) model (accuracy = 86%); the model correctly classified forwards at a rate of 100%, midfielders at 85%, and defenders at 75%. ROC analysis revealed high discriminative power for all playing positions, with AUC values of 1.00 for Forwards, 0.96 for Defenders, and 0.94 for Midfielders. Feature importance analysis revealed that the most influential variables in playing position classification were 20 m dribbling, shooting, body weight, and dribbling; while the head juggling and mixed juggling variables contributed the least to the model. These findings demonstrate that playing position-specific physical and technical characteristics in footballers can be reliably distinguished using both statistical methods and machine learning models, and that performance variables based on speed, finishing ability and physical capacity are particularly decisive in playing position classification.
Data availability
Data are available for research purposes from the corresponding author upon reasonable request. Individual de-identified participant data, statistical codes, and additional materials supporting the findings of this study are available upon reasonable request from the corresponding author of this paper.
References
Mou, C. The attention mechanism performance analysis for football players using the internet of things and deep learning. IEEE Access. 12, 4948–4957 (2024).
Martín-Castellanos, A. et al. How do the football teams play in laliga? Analysis and comparison of playing styles according to the outcome. Int. J. Perform. Anal. Sport. 24, 18–30 (2024).
Richter, C., O’Reilly, M. & Delahunt, E. Machine learning in sports science: challenges and opportunities. Sports Biomech. 23, 961–967 (2024).
Ross, G. B., Clouthier, A. L., Boyle, A., Fischer, S. L. & Graham, R. B. Comparison of machine learning classifiers for differentiating level and sport using movement data. J. Sports Sci. 40, 2166–2172 (2022).
Kurtoğlu, A. et al. The role of morphometric characteristics in predicting 20-meter sprint performance through machine learning. Sci. Rep. 14, 16593 (2024).
Silacci, A., Taiar, R. & Caon, M. Towards an AI-Based tailored training planning for road cyclists: A case study. Appl. Sci. 11, 313 (2020).
Lin, L. S., Kao, C. H., Li, Y. J., Chen, H. H. & Chen, H. Y. Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model. Math. Biosci. Eng. 20, 17672–17701 (2023).
Dijkhuis, T. B., Kempe, M. & Lemmink, K. A. P. M. Early prediction of physical performance in elite soccer Matches—A machine learning approach to support substitutions. Entropy 23, 952 (2021).
Nassis, G., Verhagen, E., Brito, J., Figueiredo, P. & Krustrup, P. A review of machine learning applications in soccer with an emphasis on injury risk. Biol. Sport. 40, 233–239 (2023).
Pillitteri, G. et al. Relationship between external and internal load indicators and injury using machine learning in professional soccer: a systematic review and meta-analysis. Res. Sports Med. 1–37 https://doi.org/10.1080/15438627.2023.2297190 (2023).
Rico-González, M., Pino-Ortega, J., Méndez, A., Clemente, F. & Baca, A. Machine learning application in soccer: a systematic review. Biol. Sport. 40, 249–263 (2023).
Schuth, G. et al. Football movement Profile–Based Creatine-Kinase prediction performs similarly to global positioning System–Derived machine learning models in National-Team soccer players. Int. J. Sports Physiol. Perform. 1–8 https://doi.org/10.1123/ijspp.2024-0077 (2024).
Berrar, D., Lopes, P. & Dubitzky, W. Incorporating domain knowledge in machine learning for soccer outcome prediction. Mach. Learn. 108, 97–126 (2019).
Memmert, D., Lemmink, K. A. P. M. & Sampaio, J. Current approaches to tactical performance analyses in soccer using position data. Sports Med. 47, 1–10 (2017).
Hewitt, J. H. & Karakuş, O. A machine learning approach for player and position adjusted expected goals in football (soccer). Frankl. Open. 4, 100034 (2023).
Manish., S., Bhagat, V. & Pramila, R. Prediction of Football Players Performance using Machine Learning and Deep Learning Algorithms. in 2nd International Conference for Emerging Technology (INCET) 1–5 (IEEE, 2021). 1–5. https://doi.org/10.1109/INCET51464.2021.9456424 (2021).
Cortez, A., Trigo, A. & Loureiro, N. Football match Line-Up prediction based on physiological variables: A machine learning approach. Computers 11, 40 (2022).
Bongiovanni, T. et al. How do football playing positions differ in body composition? A first insight into white Italian Serie A and Serie B players. J. Funct. Morphol. Kinesiol. 8, 80 (2023).
Teixeira, J. E. et al. Effects of match Location, quality of opposition and match outcome on match running performance in a Portuguese professional football team. Entropy 23, 973 (2021).
Michailidis, Y. The relationship between aerobic Capacity, anthropometric Characteristics, and performance in the Yo-Yo intermittent recovery test among elite young football players: differences between playing positions. Appl. Sci. 14, 3413 (2024).
Faul, F., Erdfelder, E., Lang, A. G. & Buchner, A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods. 39, 175–191 (2007).
Köksal, M., Gül, G. K. & Doğanay, M. & Álvarez-Garcia, C. Effects of coordination training on the technical development in 10-/13-year-old football players. J. Sports Med. Phys. Fitness 61, (2021).
Ibrahim, C., Kuan, G., Muhamad, A. S., Kueh, Y. C. & Chin, N. S. The effect of virtual reality imagery on motivation and football kicking skill performance among youth football players in Sarawak. 57–70. https://doi.org/10.1007/978-981-19-8159-3_5 (2023).
Arslan, Y. & Ermiş, E. The effects of life kinetic exercises on technical skills and motor skills performance in young football players. Eur. J. Phys. Educ. Sport Sci. 9, (2023).
Russell, M., Benton, D. & Kingsley, M. Reliability and construct validity of soccer skills tests that measure passing, shooting, and dribbling. J. Sports Sci. 28, 1399–1408 (2010).
Cui, C. Player detection based on support vector machine in football videos. Int. J. Perform. Eng. https://doi.org/10.23940/ijpe.18.02.p12.309319 (2018).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Halder, R. K., Uddin, M. N., Uddin, M. A., Aryal, S. & Khraisat, A. Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications. J. Big Data. 11, 113 (2024).
Adem, K. Diagnosis of breast cancer with stacked autoencoder and subspace kNN. Phys. A: Stat. Mech. Its Appl. 551, 124591 (2020).
Nusinovici, S. et al. Logistic regression was as good as machine learning for predicting major chronic diseases. J. Clin. Epidemiol. 122, 56–69 (2020).
Levy, J. J. & O’Malley, A. J. Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning. BMC Med. Res. Methodol. 20, 171 (2020).
Herold, M. et al. Machine learning in men’s professional football: current applications and future directions for improving attacking play. Int. J. Sports Sci. Coach. 14, 798–817 (2019).
Ćwiklinski, B., Giełczyk, A. & Choraś, M. Who will score? A machine learning approach to supporting football team Building and transfers. Entropy 23, 90 (2021).
Frangoudes, F., Matsangidou, M., Schiza, E. C., Neokleous, K. & Pattichis, C. S. Assessing human motion during exercise using machine learning: A literature review. IEEE Access. 10, 86874–86903 (2022).
Chen, R. C., Dewi, C., Huang, S. W. & Caraka, R. E. Selecting critical features for data classification based on machine learning methods. J. Big Data. 7, 52 (2020).
Dai, H. Research on SVM improved algorithm for large data classification. in. IEEE 3rd International Conference on Big Data Analysis (ICBDA) 181–185 (IEEE, 2018). https://doi.org/10.1109/ICBDA.2018.8367673 (2018).
Pedretti, A., Pedretti, A., De Oliveira Fernandes, J. B. & Rebelo, C. A. N. & Teixeira Seabra, A. F. The relative age effects in young soccer players and it relations with the competitive level, specific position, morphological characteristics, physical fitness and technical skills. Pensar a Prática 19, (2016).
Joo, C. H. & Seo, D. I. Analysis of physical fitness and technical skills of youth soccer players according to playing position. J. Exerc. Rehabil. 12, 548–552 (2016).
Razali, N., Mustapha, A. & Yatim, F. A. Ab Aziz, R. Predicting player position for talent identification in association football. IOP Conf. Ser. Mater. Sci. Eng. 226, 012087 (2017).
Sander Utomo, K. & Wiradinata, T. Optimal playing position prediction in football matches: A machine learning approach. Int. J. Inform. Eng. Electron. Bus. 15, 30–47 (2023).
Gonçalves, B. V., Figueira, B. E., Maçãs, V. & Sampaio, J. Effect of player position on movement behaviour, physical and physiological performances during an 11-a-side football game. J. Sports Sci. 32, 191–199 (2014).
Beato, M., Youngs, A. & Costin, A. J. The analysis of physical performance during official competitions in professional english football: do Positions, game Locations, and results influence players’ game demands? J. Strength. Cond Res. 38, e226–e234 (2024).
Łukaszuk, T. & Krawczuk, J. Importance of feature selection stability in the classifier evaluation on high-dimensional genetic data. PeerJ 12, e18405 (2024).
Saeys, Y., Inza, I. & Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007).
Pudjihartono, N., Fadason, T., Kempa-Liehr, A. W. & O’Sullivan, J. M. A review of feature selection methods for machine Learning-Based disease risk prediction. Front. Bioinf. 2, (2022).
Acknowledgements
We would like to thank Princess Nourah bint Abdulrahman University for supporting this project through Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R286), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Funding
This research was funded by the Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R286), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The study sponsor had no role in the data analysis or collection, writing of the report, or decision to submit the manuscript for publication.
Author information
Authors and Affiliations
Contributions
Conceptualization: Z.I., Y.S., U.M., B.Y., A.K, and M. I. A.; data curation: Z.I., Y.S., U.M., Z.Y., S.B., B.Y., A.K; formal analysis: Z.I., S.R., A.K., ; methodology: Z.I., Y.S., U.M., A.K., and M. I. A.; writing—original draft: Z.I., Y.S., U.M., Z.Y., S.B., B.Y., S.R., A.K and M. I. A.; writing—review and editing: Z.I., Y.S., U.M., Z.Y., S.B., B.Y., S.R., A.K and M.I.A. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
The study was approved by Faculty of Physical Education and Basic Military Sciences Ethics Committee with decision number 2024/11. In addition, all participants and their families were informed about the purpose, reason, and possible contributions of the study to the literature and consent forms were signed by both participants and their families. The study was conducted in accordance with the principles set out in the Declaration of Helsinki.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Izhanov, Z., Seisenbekov, Y., Marchibayeva, U. et al. Position prediction from performance and anthropometric indicators in young footballers: a machine learning approach. Sci Rep (2026). https://doi.org/10.1038/s41598-026-37957-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-37957-2