Abstract
Sports motion recognition is essential for performance analysis, injury prevention, and athlete monitoring. Traditional deep learning models, such as Long Short-Term Memory (LSTM) and Transformer-based architectures, struggle to capture motion dynamics with long-term dependencies or noise in their inputs. To overcome these limitations, this work proposes an Evolved Parallel Recurrent Network (EPRN) with wavelet transforms for high-precision motion recognition. The EPRN framework utilizes parallel recurrent pathways to enhance temporal modeling, while wavelet-based feature extraction preserves the fine-grained details of motion at multiple spatial and temporal resolutions. The proposed method has been tested on benchmark sports motion data and compared with several common architectures, including LSTM, Gated Recurrent Units (GRUs), and Convolutional Neural Network (CNN) models. The experiments demonstrated that EPRN outperforms the other models, reducing the root mean squared error (RMSE) by 23.5% and increasing the structural similarity index (SSIM) by 12.7%, indicating its effectiveness in reconstructing motion trajectories with reduced error. Furthermore, the residual analysis confirms the result that EPRN has lower error variability and less sensitivity to abrupt motion transitions, thus being a more robust solution for real-world applications. The results, therefore, indicate that combining wavelet-transform-based feature extraction with recurrent deep learning significantly enhances the accuracy of motion recognition. The real-life applications of this work include sports performance analysis, real-time motion tracking, and rehabilitation systems. Future work will focus on multimodal data fusion (e.g., video plus wearable sensor data) and also lightweight EPRN variants suitable for real-time applications.
Introduction
In an interdisciplinary applied area between artificial intelligence, biomechanics, and sports performance analysis, human motion recognition in sports constitutes the most critical factor in sports training, injury prevention, and performance enhancement through understanding movement efficiency and biomechanics for more precise analysis of sports activities1. Very recently, evolving deep learning methods have enabled the automatic and more accurate classification and analysis of sports movements, surpassing traditional motion analysis methods that relied on manual annotation or overly simplistic kinematic models.
Significant improvements have been made in the field of sports motion recognition, thanks to advancements in signal processing and artificial intelligence. However, the challenges include maintaining temporal coherence, extracting features from complex motion data, and striking a balance between computational efficiency and model accuracy. LSTM with recurrent neural networks (RNNs)2, including gated recurrent units, has proved efficient for sequential processing of motion data, but with complex dependencies and efficiency pitfalls. Time-frequency analysis using the wavelet transform is very beneficial in extracting both fine and coarse features of movement; however, it is insufficient when combined with a deep learning architecture for sports applications.
This work presents a new hybrid framework that extracts multi-resolution motion features using a discrete wavelet transform (DWT)3,4, followed by subsequent modeling with EPRN. This combination enhances the accuracy for sports motion recognition within limits imposed by computational efficiency. Unlike any previous scheme that has concentrated its efforts on either spatial feature alone or temporal aspects alone, this new approach optimally combines wavelet decomposition and deep recurrent learning to achieve better improvements in classification accuracy and model robustness under changes in sports motion conditions.
In this study, various wavelet families, including Haar, db4, sym8, and coif55, are systematically analyzed, and their performance is compared with traditional deep learning techniques. The proposed system is designed as a scalable solution for real-time applications in sports analysis, rehabilitation, and virtual training environments.
Research motivation
The need for accurate and efficient sports motion recognition systems stems from various demands, including sports training, injury prevention, and rehabilitation. Athletes and coaches rely on motion analysis to optimize their performance and reduce the risk of injury. Traditional methods such as marker-based motion capture and manual video annotation are often time-consuming, expensive, and prone to human error. Consequently, there is a growing demand for AI-driven solutions that can perform real-time, automated, and highly accurate motion classification.
Beyond sports performance enhancement, motion recognition plays a critical role in medical and rehabilitation applications. For instance, physiotherapists use motion tracking to assess the progress of patient recovery following an injury. By leveraging deep learning and wavelet-based feature extraction, this study aims to bridge the gap between real-time sports analysis and precise medical diagnoses, facilitating better assessment and more effective intervention planning.
Additionally, sports motion recognition is increasingly important in emerging domains such as virtual reality (VR) and augmented reality (AR)-based training6. AI-powered systems can simulate and analyze movements in real-time, providing athletes with interactive and data-driven feedback. These advancements enhance the efficiency of sports training by delivering precise and data-informed guidance.
Unique contributions of the study
This study introduces a novel approach to sports motion recognition by integrating deep learning architectures with wavelet-based time-frequency analysis. Unlike conventional methods that rely on either spatial or temporal feature extraction alone, the proposed framework optimally combines multi-resolution motion decomposition with sequential modeling, leading to a more robust and efficient recognition system.
A key contribution of this research is the application of the DWT for extracting hierarchical motion features at different resolutions. By decomposing motion signals into different frequency bands, the model captures both micro- and macro-movement patterns, ensuring a comprehensive representation of sports activities. This capability is handy for recognizing complex and high-speed movements, where traditional deep learning methods often struggle to maintain accuracy.
Another major innovation is the introduction of Evolved Parallel Recurrent Networks (EPRN) for sequence modeling. Unlike conventional recurrent architectures such as LSTM and GRU, which often suffer from vanishing gradients and high computational costs, the EPRN framework optimizes sequence processing through parallelized recurrent structures. This enhancement leads to improved learning efficiency, faster convergence, and better long-term dependency retention, making it highly suitable for real-time applications in sports motion analysis.
Furthermore, this study provides an extensive comparative evaluation of different wavelet families, including Haar, db4, sym8, and coif5, in the context of sports motion recognition. By systematically analyzing the effectiveness of each wavelet type, this research identifies the most suitable wavelet function for different motion types, providing valuable insights for future studies in both AI-driven motion analysis and biomechanical research.
Lastly, the proposed framework is designed to be computationally efficient and scalable, making it well-suited for real-time applications such as live sports performance monitoring, rehabilitation tracking, and interactive VR-based training. By addressing both accuracy and efficiency, this study lays the groundwork for future advancements in AI-driven sports motion analysis, offering a practical solution for both research and commercial applications.
Related works
Special attention has been given to human motion comprehension and analysis, particularly in sports activities, due to the extensive development of deep learning and motion capture technologies. Classical anatomy was, to some extent, involved in biomechanical analysis and handcrafted feature extraction for movement pattern analysis, but numerous issues related to reliability, adaptability, and scalability arose. Advancements in AI applications and deep learning have significantly enhanced the ability of motion recognition, offering increased robustness and efficiency.
Earlier works in sports motion recognition have concentrated primarily on classical computer vision techniques, including optical flow and keypoint detection, to analyze movement patterns. These methods ushered in a new era, but they were highly susceptible to occlusions from environmental variations, resulting in unreliable results. This domain, therefore, became mature thanks to deep learning algorithms, such as CNNs and RNNs, for automatic feature extraction and sequential motion analysis. Hybrid CNN-RNN architectures have consistently demonstrated superiority over traditional handcrafted methods in recognizing complex sports movements, as demonstrated by Alomar et al.2, Le et al.7, and Pandey et al.8. Deep learning algorithms consistently outperform traditional handcrafted methods in terms of accuracy and adaptability. To mention a few, researchers such as Aksan et al.9 also incorporate transformer-based models to enhance motion prediction and recognition accuracy, as well as generalization. Moreover, they include wearable sensors such as IMUs and EMG sensors, which are great at collecting motion signatures. These sensor data are reliable for in-depth analysis of motion activities in real-life environments. As demonstrated by Hwang et al.5 and Zhang et al.10, IMU-data-based deep learning models can distinguish among running, jumping, and weightlifting. In contrast, Martini et al.11 combined pressure-sensitive insoles and deep neural networks in a gait analysis application, while Mekruksavanich et al.12 investigated the problem of multimodal sensor data fusion for higher precision in tracking motion under dynamic conditions. Wavelet analysis techniques have emerged rapidly in the analysis of motion signals, particularly in terms of time and frequency. Here, wavelet transforms are helpful because they possess the property of capturing both fine and coarse movement characteristics, which helps explain complex and variable motions. Chen et al.13 carried out wavelet transformation and used it to capture significant movement features from video datasets, which improves segmentation and recognition accuracy. Similarly, Li et al.14 employed wavelet decomposition combined with RNNs as a step toward analyzing sports motions and encouraged further in-depth investigation of the motion.
Deep learning remains a fundamental tool in assessing sports performance and preventing injuries. A reinforcement learning-based approach to optimizing training regimens for athletes is presented by Chao et al.15, while Habibi et al.16 developed an AI-powered pose estimation system for postural and biomechanical evaluation. Parekh et al.17 introduced a deep learning framework for the detection of fatigue in players, and Munoz-Macho et al.18 proposed injury risk prediction using an AI-based system in high-performance sports, highlighting the extended potential of AI in athlete monitoring and performance enhancement.
The latest research has explored multimodal data fusion in the acquisition of motion recognition by integrating different types of data, including human video, sensor data, and physiological data, within a single system. Psaltis et al.19, Zhou et al.20, and He et al.21 created robust recognition frameworks using complementary information from various sources. This fusion makes deeper learning models more reliable and generalizable, potentially enabling them to utilize a broader range of motion types and environmental conditions. Duan et al.22 It was also proposed that one use GNN for analyzing human motion patterns, which noticeably improved the interpretability and classification accuracy of sports motion recognition systems.
Methodology
Proposed approach
The comprehensive and multi-phased stance proposed regarding the sports movement reconstruction process will extract motion features that account for sequential dependencies and efficiently enhance the recreation of movements. It will cover the entire system, integrating feature extraction, sequential modeling, and motion reconstruction to ensure scalability and efficacy. The adopted methodology extensively utilizes the Wavelet Transform (WT)19, CNN, and EPRN to enhance and enrich the understanding of motions with high precision and generalization. Therefore, the new method yields a robust representation of motion sequences that are free from the drawbacks of earlier models.
Motion reconstruction is inherently a challenging endeavor due to its inherent complexity and the large number of dimensions involved, which are usually far beyond the norm for ordinary, tedious tasks. Motion models are inefficient in the fine and broad scales of temporal dependencies and spatial correlations23. These deficiencies in traditional methods are addressed by introducing Wavelet Extraction for representative features, CNN to model spatiotemporal parameters, and EPRN to capture long-term sequential dependencies. The linkage of these three steps in the process enables the model to have a holistic understanding of the motion patterns. LSTM nets and Transformer models are used for comparison testing against the proposed model to demonstrate its superiority in motion prediction and reconstruction works.
The initial stage of the proposed model involves the extraction of motion features through the use of the WT application through other multiple applications, which is found in the analysis of motion data in the time-frequency domain24. For a start, unlike the Fourier Transform, which allows only frequency-based information, it enables localized frequency analysis; indeed, it renders the strength of this transform for handling non-stationary motion data25. This method inherits the DWT and applies its processing to decompose motion signals into multiple frequency bands, thereby extracting high-frequency transient events and lower-frequency global motion patterns. This localized decomposition enhances the extraction of features that are both temporally and frequency sensitive, facilitating better downstream processing. Mathematically, the wavelet decomposition of the signal can be expressed3:
Notably, \({\psi _{j,k}}(t)\) is the wavelet basis function and \({c_{j,k}}\) are the coefficients of the wavelet which encode local signal variations. The coefficients are input features into the following stages of learning, thereby maintaining motion information from both high and low frequencies.
In contrast to simple motion feature extraction based on the wavelet transform, the extracted data was again entered into the system via a CNN-based model. The CNN comprises several convolutional layers (Conv2D), batch normalization, pooling layers, and finally nonlinear activation functions, such as ReLU to learn the hierarchical patterns within the motion sequences. While the convolutional layers model the spatial correlations in motion data, the pooling layers ultimately reduce the dimensionality of the data, allowing for improved generalization. Thus, integrating CNNs with the preceding wavelet-based feature extraction ensures a good balance between local and global information about motion within the model, thereby further enhancing its predictive capabilities. Mathematically, the process of feature extraction in CNN may therefore be written as26:
In this convolutional procedure, \({F_l}\) depicts the feature map in the layer l, while \({W_l}\) represents the convolutional kernel; * denotes the convolution operation; and \({b_l}\) indicates the bias term. The following spatiotemporal features are then supplied to a recurrent network for sequential modeling. Combining CNN with EPRN facilitates the smooth and efficient incorporation of extracted spatial patterns into the motion reconstruction pipeline.
To maintain long-term dependencies on motion frames, EPRN is employed. Long sequences have traditionally plagued ordinary recurrent networks (LSTM and GRU) with vanishing gradient problems and computational inefficiencies. EPRN, however, overcomes this limitation by proposing parallel recurrent pathways and optimizing the gating mechanism to preserve most temporal dependencies while maintaining speed. Hence, it provides effective motion reconstruction while capturing those long-range dependencies that the CNN could not model. The formulation for the recurrent update function in EPRN is27:
where \({h_t}\) denotes the hidden state at time \(t, {x_t}\) is for input features, and \({W_h},{W_x}\) are learnable weight matrices. With the parallel structure of EPRN, faster training can be achieved while also facilitating more effective feature fusion across multiple timescales. Otherwise, over time, it maintains the same sequence as normal LSTMs. The parallel nature of EPRN accelerates training while preserving the accurate long-term dependencies of LSTMs that supplement these together.
To evaluate the effectiveness of these proposed approaches, they are compared with the performance of two very popular sequence learning models, namely LSTM and Transformer. While the capture range of both short-term and middle-term duration dependencies is good, long-term dependencies usually become quite problematic with this model. On the other hand, global dependencies are captured by the Transformers using self-attention mechanisms; however, this approach becomes expensive as it must consider continuous motion data. EPRN appears to have paved the way by creating and combining the advantages of both recurrent and parallel approaches, resulting in high-quality motion reconstruction accuracy while being highly efficient. Thus, the three units —WT, CNN, and EPRN — in a single model complete the process of the holistic approach to motion-sequence analysis in itself. A general layout of the proposed systematic process, as shown in Fig. 1, outlines several stages of processing and learning associated with the work undertaken.
A schematic diagram of the model architecture.
Collection of pre-processing data
High-quality data are collected by 3D motion capture systems for sports motion recognition. In this study, the CMU Motion Capture Database has been utilized for its collection of precise motion sequences related to various sports activities performed by athletes. Such a database is comprised of detailed trajectory information of the positions, velocities, and accelerations of several joints during particular sports movements, thus making it very applicable to sports motion analysis28.
Characteristics of data:
-
All the data are composed of 3D motion sequences, which consist of time and three-dimensional positions of body joints. The states of all relays’ joints and body motions are stored in ASF/AMC file formats.
-
Details cover many sports: running, swimming, basketball, soccer, and other athletics, so it allows the model to learn and analyze a whole lot of motion-pattern generalizations.
To make sure that the data was eligible for sports motion recognition, it would go through preprocessing before presenting the data to the deep learning model that would use it:
-
a.
Data cleaning and noise removal:
Noisy raw motion capture data also contains systematic errors due to both the capture system and external factors. Therefore, any data in this regard will require some preprocessing, which begins with Gaussian smoothing for noise reduction and improved accuracy. Another point in preprocessing is the restoration of frames that are incomplete or have missing parts, using either spline or linear interpolation, thereby maintaining the continuous flow of motion data.
-
b.
Normalization and standardization:
The motion data is thus normalized across different postures to avoid the influence of scale issues and variations in body posture.
-
c.
WT feature extraction.
DWT helps extract the features of interest from the first-movement data. Time-frequency analysis, specifically the DWT, effectively captures rapid and sustained movements. Daubechies wavelets are employed for the level of decomposition, providing details for high-frequency transient movements as well as for low-frequency global movement patterns29. DWT acts on the data at various decomposition levels, decomposing it into multiple frequency bands. This process retains the fine details (high frequencies) of transient movements and the gross movement patterns (low frequencies).
-
d.
Data segmentation.
For model training, motion data are segmented into smaller segments, such as specific sports movements, including running, jumping, or striking. This segmentation enables the model to adapt to the unique features of each motion, allowing it to collate knowledge for distinguishing even complex patterns. Sliding windowing techniques extract time-windows of data with suitable overlaps that enable the temporal continuity of being retained, so that the transitions are all well represented30.
-
e.
Feature reinforcement with temporal and spatial dynamics.
Dynamic features, in addition to position, such as joint velocity, angular velocity, and acceleration, are extracted. The dynamic features are derived by calculating the time derivatives of the position variables. The temporal dynamics enable the model to accurately imitate minor changes in motion, thereby enhancing its ability to identify sports movements. Especially, the spatial velocity vectors capture interaction between different joints in the body to understand complex movements better31.
-
f.
Data augmentation:
To enhance the model’s generalization capability for unseen data and mitigate the risk of overfitting, augmentation techniques are employed. Such techniques are rotations, scaling, translation, flipping, and speed variations. Speed variation is beneficial for simulations in various sports conditions by introducing random temporal speed changes to motion data. This forms a basis for the model to learn to recognize movements in very varied contexts and to become more robust30.
After these steps for pre-processing, the motion data is put into the deep learning model. The wavelet-extracted features, along with the dynamic set of features, form the input to the model for accurately recognizing and classifying various types of sports movements.
Feature extraction by discrete wavelet transform (DWT)
To capture information in both time and frequency domains, the DWT is employed for feature extraction. The DWT decomposes each motion signal into multiple levels of approximation and detail coefficients, allowing for the identification of essential movement patterns while reducing noise. The mathematical definition of the DWT of a discrete signal \(x[n]:\)
where \({\psi _{j,k}}\) represents the wavelet basis function at scale j and translation k. To ensure that features are exhaustively extracted, we use four very prominent wavelets: Haar, Daubechies (\(db4\)), Coiflets, and Symlets32, each having its advantage in motion pattern capture. These properties provide an intermediate compromise between time and frequency localization, which is useful for human motion sequences. We perform multi-level wavelet decomposition on each of the motion signals and extract statistical features, including energy, entropy, standard deviation, and kurtosis, from the wavelet coefficients. These features are then fed into the deep learning model as input. Figure 2 illustrates the application of such a multi-level wavelet decomposition for one motion sequence33.
Multi-level wavelet decomposition applied to a motion sequence.
Evolved Parallel Recurrent Networks (EPRN)
Our deep learning framework is based on EPRNs, which aim to represent spatial and temporal dependencies in the targeted motion sequences. The EPRN comprises many independent recurrent layers of evolved architectures whose genetic algorithms optimize the network topology and hyperparameters. This choice ensures an adaptive learning process that accommodates the complexity of the given human dance movements.
The network utilizes LSTM and GRU layers, operating along separate paths and merging their outputs to enhance performance. The LSTM part captures long-term dependencies, whereas the GRU part improves computational efficiency. The recurrent units can be expressed mathematically as follows34:
The f and g are the activation functions for LSTM and GRU units, respectively, and the last representation will be attained from an attention-based fusion mechanism to weight the contributions from both recurrent units adaptively. This is to permit the network to select the most relevant temporal dependencies for motion reconstruction dynamically35. The structure of the EPRN is illustrated in Fig. 3.
Architecture of the EPRN, illustrating parallel LSTM and GRU paths.
Training and optimization strategy
Supervised learning, complemented by evolutionary optimization, is used for training purposes. The training is done backpropagation through time (BPTT) using Adam optimizer and minimizing the Mean Squared Error (MSE) loss function36:
Where \({y_i}\) stands for the ground truth motion sequence and \({\hat {y}_i}\) for the predicted one. Dropout regularization and batch normalization are applied to reinforce generalization.
To achieve higher performance, genetic algorithms iteratively optimize hyperparameters, such as learning rate, batch size, and neuron configurations, within a recurrent unit. The evolutionary algorithm selects, mutates, and recombines to yield the optimal model parameters. The update rule follows:
where \(\theta\) were model parameters and \(\eta\) was the adaptive learning rate. The fitness function of the genetic algorithm is relative to validation loss and reconstruction accuracy. The evolutionary optimization process is given in Fig. 4.
Optimization of hyperparameters by genetic algorithm.
Evaluation metrics and experimental setup
Numerous evaluation metrics are used to assess the performance of the proposed framework:
-
MSE: This measures the average squared difference between the predicted and actual joint positions.
-
RMSE: This provides a more interpretable measure of error.
-
DTW: This measures the temporal alignment between the reconstructed and ground-truth sequences using37:
where \(d({x_i},{y_j})\) is a distance metric between sequences X and Y.
-
SSIM: Quantifying the resemblance of predicted and actual motion data in terms of spatial patterns38:
We use the datasets “motion_data.asx” and “motion_data.amc” in our various experiments (The datasets analyzed during the current study are available from the following link: https://mocap.cs.cmu.edu).
The split dataset has an 80:20 ratio of training and testing portions. Normalizing all motion sequences at the first step helps avoid inconsistencies in comparison among these samples. In MATLAB, the entire framework is developed and enabled to accelerate computation using a GPU.
Experimental characterization of hyperparameter tuning by evolutionary optimization methods, having a population size equal to 20 and a crossover rate equal to 0.8. The training proceeds for 200 epochs under adaptive learning rate decay. The reconstructed motion sequences are then compared both visually and quantitatively to the ground truths. The actual reconstructed motion sequences are shown in comparison in Fig. 5.
Relation of actual vs. reconstructed motion sequences.
Implementation details and reproducibility
To ensure methodological transparency and support reproducibility, this section provides a comprehensive overview of the model configuration, training setup, and wavelet integration process used in the proposed EPRN. The architecture is composed of wavelet-based feature extraction, convolutional processing, and a parallel recurrent network structure. The exact specifications are presented in Table 1.
As part of the preprocessing activities, wavelet features are computed for each segment of motion. The sequence is processed through a three-level DWT using the Symlet 4 wavelet, and statistical descriptors are calculated from the resultant subband coefficients. These wavelet features are integrated with temporal motion characteristics such as speed and acceleration to construct an exhaustive feature vector. Such a vector is used to train a CNN that abstracts spatial relationships within the data. The outputs from the CNN are then fed into the EPRN, where streams of LSTM and GRU networks operate in parallel, merge, and reconstruct the information. This modular approach enables the model to flexibly combine motion information at different spatial scales, capturing both short-term localized signal fluctuations and long-term motion dependencies.
Experimental results and discussion
A comprehensive evaluation of the proposed EPRN is conducted to assess its efficiency for sports motion recognition, specifically in accurately capturing complex motion sequences while maintaining computational efficiency. This experimental analysis aims to highlight the advantages of the EPRN over conventional deep-learning counterparts, validate the learning of spatiotemporal dependencies in motion trajectories, and ascertain its applicability in real-life situations.
To account for every measure of understanding of its performance, the evaluation considered multiple essential parameters in this regard. The first consideration involves the impact of different types of wavelets transforms on feature extraction and the identification of the optimal transformation for encoding motion signals. The performance of EPRN is then compared with that of other state-of-the-art deep-learning architectures, including LSTM39, Convolutional LSTM (CNN-LSTM)40, Transformer networks41, GRU34, Temporal Convolutional Networks (TCN)42, action recognition, and motion trajectory forecasting; thus, they form a good choice for performance comparisons.
Statistical techniques have been employed to ensure the findings’ results are solid. Analysis of variance (ANOVA) and paired t-tests are used to determine whether the differences in performance observed across various models and wavelet transformations are statistically significant. Such tests serve to eliminate possible randomness and endorse the communications of improvements as being valid.
The hyperparameter tuning of the EPRN model is crucial in determining its performance; therefore, we explore Bayesian Optimization and Grid Search, both of which are conventionally used hyperparameter tuning methods. Their comparison will reveal the relative advantages they offer in terms of convergence time and generalization ability.
EPRN execution time and computational complexity are validated in comparison to other deep learning architectures. This is much-needed data for studying the feasibility of deploying EPRN for real-time sports analytics applications with implications on latency and inference speed.
Finally, the analysis of the residual error distribution and the model robustness involves investigating the deviation between predicted and actual motion sequences. Additionally, a detailed discussion is presented on the model’s limitations and potential future enhancements, with a focus on scalability and generalizability.
In addition to accuracy, the computational complexity and execution time of the EPRN are compared to various other deep learning architectures. This comparative assessment is not only crucial for understanding feasibility in the context of on-the-ground sporting applications, specifically where latency and inference speed become the primary constraints.
As a final point, the distribution of residual errors and the robustness of the model were examined by studying deviations between predicted and observed sequences of dynamic motions. A critical analysis of model limitations and future enhancements is also provided, offering insight into how EPRN can be further refined to enhance its generalizability and scalability.
This chapter conducts a systematic examination of the aspects mentioned above, encompassing both quantitative performance metrics and qualitative assessments, to reinforce the impact of the proposed framework. These results not only demonstrate how EPRN outperforms its competitors but also pave the way for further work in human motion analysis, biomechanical modeling, and intelligent sports analytics systems.
Impact of wavelet transform on feature extraction
Feature extraction is a key area of motion analysis because, ultimately, it determines how well deep learning models can learn meaningful movement patterns while suppressing noise and redundancy. The complication concerning human motion signals can impose too many constraints on conventional feature extraction techniques, which cannot retain the necessary spatiotemporal relationships required for high-precision motion recognition. This is the reason why wavelet-transform-based feature extraction has been widely chosen, as it can analyze a signal relative to both time and frequency, thereby providing a multi-resolution representation of motion dynamics. As for this study, we were systematically conducting tests on the value of four popular classes of wavelet functions—Daubechies (db4), Symlet (sym4), Coiflet (coif5), and Biorthogonal (bior1.3)38—about feature representation and classification accuracy. For each wavelet function, motion sequences were decomposed into specific frequency components, from which the model could distill and characterize structural features while filtering out noise and irrelevant variations. These features were then introduced into training and testing along the EPRN, whose appraisal was thereafter subject to several error metrics, including RMSE, MSE, MAE, and SSIM; all these metrics provide a quantifiable assessment of how good each of the wavelet functions has acted in preserving those essential motion characteristics needed in supporting good recognition.
Performance analysis of wavelet transforms
Despite the critical importance of wavelet transform in extracting and reconstructing motion features, it affects model accuracy, as shown in the comparative performance among four most popular wavelet functions - Daubechies- (db4), Symlet (sym4), Coiflet (coif5), and Biorthogonal (bior1.3)- tabulated in Table 2 and evaluated for RMSE, MSE, and MAE.
Out of the four, the Symlet (sym4) wavelet produced the least RMSE (0.076059) and MSE (0.005785), signifying its best performance in reducing the reconstruction error and sustaining the motion continuity. This result reflects the efficacy of sym4 in capturing finer details while maintaining its structural integrity, as it is indeed deduced to fit biomechanical motion analysis. The Daubechies (db4) wavelet performed better than sym4 in these error measurements, yielding competitive values: RMSE 0.084753 and MSE 0.007183, which support its use in retaining both high-frequency and low-frequency components of motion.
On the contrary, the Biorthogonal (bior1.3) wavelet presented the highest RMSE (0.091906) and MSE (0.0084468), indicating that it has been considered poor in reconstructing the motion patterns accurately. The relatively higher MAE (0.066165) indicates that bior1.3 incurs difficulties maintaining precise trajectory alignment; probably the reason lies in its asymmetric property, combined with less energy compaction efficiency. The Coiflet (coif5) wavelet is better than bior1.3, but still elicits higher RMSE and MSE scores than sym4 and d4, indicating that it would be less effective in retaining any of the finer details of motion.
This demonstrates that the choice of wavelet is crucial for motion analysis applications. Indeed, sym4 and db4 emerge as the best wavelets because they provide low reconstruction errors, along with a close match to the structures they represent, and thus become very useful in human motion tracking, gesture recognition, and biomechanics applications.
Statistical validation of wavelet performance
An experimental analysis was conducted to ascertain the significance of the observed differences among wavelet functions, which included an ANOVA on all the configurations43. The results would further substantiate the gains in performance of the Symlet (sym4) wavelet, any comparison of which could be considered statistically significant (p < 0.05), thus validating its effectiveness as the most suitable transformation for motion feature extraction. This finding highlights the substantial role of wavelet selection in maximizing motion recognition accuracy, as different wavelets commonly represent varying abilities to preserve motion dynamics versus structural consistency.
To provide an intuitive understanding of the performance differences across these wavelet functions, a box plot comparing RMSE, MSE, and MAE scores across all configurations is presented in Fig. 6.
Box plot of wavelet performance (RMSE, MSE, MAE).
Along with the numerical data in Table 1, the representation shown in Fig. 6 indicates that sym4 has the lowest error rates, with a more stable prediction compared to the others. On the other hand, the prediction of bior1.3 is very high concerning both variance and errors, which indicates its less potential at preserving continuity in motion. This suggests that an appropriate wavelet function is essential for high-fidelity motion reconstruction, thereby influencing the overall output of motion analysis models.
Implications for motion recognition models
These findings present a strong argument for the use of an appropriate wavelet transformation during the implementation of deep learning techniques for motion analysis44. The superior performance of db4 points towards it being the first candidate for biomechanical movement tracking, sports performance evaluation, or even human activity recognition because it is highly structurally preserved and error-minimized. These findings also suggest a potential need for hybrid wavelet approaches, where one or more wavelet functions can be combined to leverage their respective strengths for enhanced feature extraction.
By combining statistical validation with visual analysis and performance evaluation, the study provides a detailed understanding of how wavelet-based feature extraction affects the recognition accuracy of motion. Adaptive wavelet selection strategies are yet another field for prospective research, utilizing machine learning techniques to evolve the selection of wavelets based on the complexity of the motion being analyzed, dataset characteristics, and the classification requirements to meet.
Comparison with other deep learning models
It is also crucial to compare the performance of the proposed Evolved Parallel Recurrent Network (EPRN) with other deep learning models to validate its suitability for sports action recognition. Many architectures have been used to model sequential data; however, their capacity to capture the spatiotemporal dynamics of a motion sequence varies significantly. To provide a rigorous assessment of the advantages of EPRN, we conducted a thorough benchmark study against popular deep learning architectures, including LSTM, CNN-LSTM, Transformer networks, GRU, TCN, and Bi-LSTM. These models were selected because they have been widely utilized for time-series forecasting, action recognition, or modeling human motion trajectories. Table 3 summarizes the comparative results for these models.
Performance evaluation of EPRN vs. baseline models
As shown in Table 2, wavelet-based models outperform traditional deep learning architectures, including LSTM, GRU, and CNN, in performance metrics such as RMSE, MSE, MAE, and SSIM. This indicates that wavelet transforms are designed to capture both local and global motion patterns with minimal reconstruction error.
Among all tested models, sym4 achieved the lowest RWMSR (0.076059), the lowest MSE (0.005785), and the lowest MAE (0.051933), indicating the best performance in terms of motion dynamics with maximum fidelity and minimal distortion. Parallel, db4 also produced impressive results, with an RMSE of 0.084753 and an SSIM of 0.94362 confirmed. Thus, these measurements also assured its capabilities in very high-precision motion reconstruction applications. The smooth and compact-support properties of these wavelets make them even more capable of retaining fine-grained motion structures and, therefore, apply to exercise biomechanics.
In contrast, all previous discussions about traditional recurrent models, such as LSTM and GRU, point to high rates on most error scores. LSTM results presented RMSE = 0.13841 and MSE = 0.019157. GRU demonstrated even lower performance than LSTM, with the following output: RMSE = 0.18026 and MSE = 0.032493. All this happens despite their capabilities to model sequential dependencies; however, they do not seem to be able to model long-range motion patterns, which could probably be due to the vanishing gradient problems and limitations of their memory mechanisms.
Although CNN-based approaches demonstrate computational efficiency in terms of accuracy, they do not achieve the same high accuracy as wavelet-based methods. With an RMSE of 0.14723 and an SSIM of 0.95251, the CNN lags behind db4 and sym4, suggesting that convolutional architectures may not provide as much efficiency in reconstructing fine motion details. This is primarily because they fail to model the temporal dependencies characteristic of motion explicitly.
Among the different wavelet-based models, bior1.3 performs the poorest, with an RMSE of 0.091906 and an SSIM of 0.92655. This implies that while biorthogonal wavelets may be good in signal decomposition, they are not optimal for preserving motion trajectories. Coif5 follows with moderate performance (0.086645, 0.93146) implying that holding high-order vanishing moments doesn’t necessarily correspond to superior extraction of motion features.
In totality, these facts and findings emphasize the crucial importance of selecting an optimal wavelet function for motion reconstruction accuracy. The superiority demonstrated by sym4 and db4 even extends to strengthen further the argument for wavelet feature extraction in motion analysis, as it applies comparably to most applications that require high fidelity and structural preservation, such as those in deep learning models.
Statistical significance of model comparisons
To validate the observed performance differences as statistically significant and not due to random variations in the datasets, a paired t-test was conducted by comparing each wavelet function. The analysis reveals statistically significant performance differences among the evaluated wavelet transforms (p < 0.05), thereby reinforcing confidence in the observed trends. In particular, the lowest RMSE, MSE, and MAE values are recorded for the Symlet (sym4) wavelet, indicating its superior ability to capture motion features with less reconstruction error.
For a clearer view of these comparisons, Fig. 7 presents a bar graph illustrating the RMSE, MSE, and MAE for all wavelet functions tested. The graphic shows that sym4 and db4 perform better in terms of minimizing errors, while bior1.3 yields the highest error metrics, indicating that it is not suitable for accurate motion reconstruction. Hence, this graphical representation provides further evidence of the numerical results, demonstrating that selecting an appropriate wavelet function is beneficial for motion trajectory analysis.
Bar chart comparing RMSE, MSE, and MAE across models.
Implications for motion recognition and future directions
In this light, it suggests the need for hybrid deep learning architectures that effectively integrate feature extraction, recurrent, and hierarchical temporal modeling pertinent to motion recognition. In experiments with parallel recurrent architectures, EPRN improves stability, exhibiting better sequence retention and lower accumulated error rates, which makes it suitable for applications such as sports biomechanics, human activity monitoring, and movement disorders.
Future research will explore other interesting avenues, including ensemble approaches that incorporate EPRN and Transformer-based architectures to leverage the complementary strengths of self-attention mechanisms and recurrent dynamics for enhanced performance. EPRN would also be customized for resource-challenged environments, such as wearable motion trackers and real-time sports analysis methodologies, by improving computational performance through alternative means, including pruning and knowledge distillation.
Hyperparameter tuning and optimization
The selection and tuning of hyperparameters are imperative from the perspectives of performance and computational efficiency of any deep learning model. The characteristics of sports motion data have a direct impact on hyperparameter tuning, specifically in terms of the model’s ability to learn spatiotemporal dependencies, avoid overfitting, and generalize well to unseen motion sequences. Misguided tuning of hyperparameters would lead to suboptimal convergence, heavy computational burdens, or a complete failure in capturing the essential dynamics of human movement.
In this study, Bayesian optimization45 was employed for fine-tuning specific key hyperparameters of EPRN. Bayesian Optimization is a probabilistic method that constructs a model of the objective function using a surrogate model, which is, by default, a Gaussian Process (GP). The following hyperparameter settings are then selected based on an acquisition function that determines the balance between exploration and exploitation. This is usually referred to when the hyperparameter interactions are very complicated and the search space is of a high dimension, as in the case of deep recurrent architectures46. The hyperparameters that most significantly affect performance were meticulously tuned.
A key factor influencing performance is the number of LSTM/GRU units, as it facilitates the learning of long-range features in motion sequences. Increasing the number of units is beneficial for the representational capacity of the model, but it also renders the computations more computationally intensive. The tuning confirmed that 128 units were optimal in terms of both performance and computational efficiency.
An important hyperparameter is the learning rate, which, among others, determines how quickly parameters are updated in the model during training. A higher learning rate can cause instability and prevent convergence, while a smaller learning rate slows down the training process. A learning rate of 0.001 was examined in terms of both convergence and efficiency.
The batch size played a crucial role in model generalization and computational efficiency. In general, a small batch size can lead to noisy updates and unstable gradients, whereas a big batch size can impede generalization. After rigorous testing, batch size 32 was chosen as a reasonable compromise between speed of convergence and memory usage.
Furthermore, the dropout rate was fine-tuned to suppress overfitting. Dropout works by randomly switching off a certain percentage of neurons during training, thereby reducing overdependence on some specific features and allowing for better generalization. It was found that a dropout rate of 0.5 yielded the best results, preventing overfitting while maintaining the learning of complex patterns.
Comparison of bayesian optimization and grid search
For assessing the effectiveness of Bayesian Optimization, it is compared with Grid Search, a standard method. Using Grid Search, all hyperparameter combinations are explored within a specific range, which is thus inefficient and computationally expensive for deep learning models with high-dimensional search spaces.
Bayesian Optimization exhibited improved generalization performance in addition to converging at a speed 35% faster than that of Grid Search. Thanks to its adaptive nature, the Bayesian Optimization method could effectively decide which areas of the hyperparameter space to explore and which settings to avoid when suboptimal. Figure 8 shows the training convergence of the EPRN by both hyperparameter tuning strategies47.
From Fig. 8, it is clear that the loss curve for Bayesian optimization is falling steeper as it does so with time and will converge more rapidly than Grid Search; there is also a general trend whereby Bayesian Optimization constantly yields lower validation loss values in the end as compared to Grid Search, thus showing better generalization performance to other unseen motion sequences. This demonstrates the favorable aspects of probabilistic search methods as being instrumental in optimizing complex recurrent architectures with higher efficiency.
Training convergence for bayesian optimization vs. grid search.
Implications for model optimization and future research
The outcome of this study strongly supports the need for automated hyperparameter tuning to enhance the performance of deep learning models for motion recognition. The conversational speed observed in this study of Bayesian optimization further hints that future work could therefore investigate hybridization approaches that would:
-
Meta-learning approaches, in which successes from past optimization help to guide hyperparameter choices in new datasets.
-
Tuning using reinforcement learning, an agent would change hyperparameters depending on feedback from real-time performance.
-
Multi-objective optimization, wherein trade-offs on accuracy, computational cost, and energy efficiency work together.
Such hyperparameter optimization methods will help better specify future deep learning schemes, achieve faster convergence, and be more adaptable to various sports motion datasets.
Computational complexity and execution time analysis
In real-world applications, the modeling system should ensure that there is an efficient order of computation at least as necessary to an accurate model, especially in systems which depend on real time or near real-time processing in this sense, a thorough evaluation is presented to study the computational complexity and execution time for the proposed EPRN against baseline competitors, models LSTM, GRU, CNN, and wavelet ones. The analysis focused on computation time, including training, inference speed, and overall resource consumption, to determine whether EPRN offers a satisfactory trade-off between efficiency and predictive performance.
The total time taken to complete training and the average time taken for inference on each test sample were the two major indicators of computing efficiency. The third row in Table 4 states that the LSTM and GRU performed poorly in terms of efficiency, recording respective times of 1803.6 s and 1580.4 s; however, these improvements were not particularly helpful in enhancing accuracy. With CNN-based models, an execution time of less than 1500 s was measured; however, the usefulness of this speed was compromised since the models failed to learn complex motion patterns effectively.
Among the four wavelet models, the execution times were 1432.6 s (Db4), 1602.0 s (Sym4), 1555.6 s (Coif5), and 1278.0 s (Bior1.3). The Bior1.3 algorithm had one of the shortest execution times, yet still maintained a competitive level of accuracy, making it potentially beneficial for many real-time applications due to its speed.
Such a design minimizes redundant computations, accelerates convergence speed, and optimally utilizes computational resources, thereby further improving the algorithmic efficiency of EPRN. This is especially helpful for applications that continuously track high-resolution motion capture data, providing low-latency responsiveness.
Overall, the analysis provides evidence of how EPRN is effective in striking a balance between accuracy and efficiency. Unlike conventional architectures that either suffer from high computational costs or limited expressiveness, EPRN possesses respectable computational capabilities and is thus highly relevant for applications such as real-time biosystems for tracking athlete performance and injury prevention, as well as several other applications in biomechanics. The numerical results presented in Table 3 further support the robustness of the proposed approach.
Residual analysis and error distribution
Residual analysis—Residual analyses are an essential evaluative element in neither judging the reliability of predictive models nor assessing their robustness. They elucidate aspects concerning error distribution and possible biases as well as generalization capability. Through residual analysis, which involves comparing actual and predicted values, we determine whether the model exhibits systematic errors, underfitting, overfitting, or biases toward specific motion patterns under consideration. An ideal model should have residuals distributed symmetrically around zero, indicating unbiased predictions, while minimizing systematic errors.
The residual plot in Fig. 9 provides an overview of the error distribution across predicted values, confirming the performance of the motion reconstruction model. A well-performing model is expected to spread the residuals randomly, leaving no evident patterns, as this indicates that the error is independent of the predicted values.
Residual plot for error analysis.
From the mentioned results, it can be inferred that the residuals are aggregated around zero, indicating that the model effectively suppresses systematic errors. The color gradient also provides the reader with insight into the error distribution, as the residuals appear relatively suitable for a diverse range of predicted values. When compared to more conventional architectures, such as LSTM and GRU, this new model demonstrated relatively stable residual spread, indicating higher predictive accuracy and lower variance in errors.
A microscopic examination of the residual distribution reveals several notable observations. Among these observations, the first is the absence of skewness and clumping of the residuals, indicating that the model is generalizing quite well across different motions. Additionally, it suggests that there were very few extreme outliers, indicating that the model accurately captures the extreme instances of motion change.
It further demonstrates that the model is robust enough to accommodate the myriad performance-derived scenarios—that is, both low- and high-intensity submissions of predictive values—as it exhibits similar residual variance across different predicted values.
For normality tests of the residuals, the Shapiro-Wilk48 and Kolmogorov-Smirnov49 tests were conducted, which confirmed that the residuals are distributed normally with minimal deviations. Additionally, Levene’s test for homogeneity of error variance confirmed the consistency of residual variance across motion types. These statistics support the model’s ability to calibrate predictions with minimal bias accurately.
To emphasize the additional advantages of EPRN, we have illustrated in Fig. 10a histogram of the residual errors for estimating the trajectory, which describes the distribution of residual errors for the trajectory estimation method. This representation is required for comparison purposes, illustrating how various models, including EPRN, LSTM, and Transformer-based architectures, handle prediction errors.
Comparative histogram of residual distributions for different models.
The analysis reveals that the residuals from LSTM and Transformer traditional techniques exhibit a broader distribution of larger values than those from the proposed method, indicating a tendency towards skewness and increased variance, which represents higher vulnerability to systematic prediction errors and inconsistency in generalization across different types of motion sequences. Thus, the distribution of residuals from the EPRN example seems to be more centered on zero, indicating that high error minimization and robust trajectory estimation are indeed its strong points.
The lower residual variance in EPRN signals indicates that the model captures both short- and long-term dependencies over motion segments, thereby preventing accumulation over time. They mean the residuals are near zero, so they clearly indicate no visible systematic bias from overestimation or underestimation, proving that the model is sound for real-life applications such as biomechanical analysis, injury risk mitigation, and sports performance tracking. There appears to be general and widespread accuracy in EPRN due to the residuals analysis, with the assumption of correct model specification. This means that EPRN achieves more accurate, less biased, and more stable predictions than other conventional deep learning architectures. This demonstrates that the wavelet approach, combined with parallel recurrent structures, has significantly greater potential for modeling motion sequences. Future investigations may wish to explore ways of adaptively correcting residuals by dynamically adjusting model parameters to match error patterns observed, thereby increasing accuracy and robustness for various motion recognition tasks.
The average residuals around zero error indicate that no obvious systematic bias occurs toward overestimating or underestimating, suggesting the validity of the model for real-world applications, such as biomechanical analysis, risk reduction due to injury, and sports performance monitoring. Generally, the view of the residual indicates that, under the assumption of proper model specification, EPRN is more accurate, has less bias, and exhibits greater predictive stability than the conventional deep learning architecture. This indicated considerable potential to be harnessed by the wavelet approach when combined with parallel recurrent structures, allowing for the accurate modeling of motion sequences. Future work may explore novel methods for adaptively correcting residuals based on dynamically changing model parameters, as observed in error patterns, to enhance accuracy and robustness across various motion-recognition tasks.
Computational complexity and execution time analysis
To assess the readiness for implementing the EPRN for real-time sports movement recognition, we evaluated its computational complexity and execution time, comparing it with a baseline that included LSTM, GRU, CNN, and wavelet-based models. The analysis considers training time, inference speed, and resource consumption, which are crucial for multi-resource-constrained environments such as real-time sports analytics or rehabilitation monitoring. The results are outlined in Table 5, and Fig. 11 captures the model comparison for inference speeds.
The training time for the EPRN model is approximately 1480.5 seconds per sample, which is significantly longer than the 7.8 ms rate for inference processing. At this inference rate, the sym4-based wavelet models are performing at 7.5 ms/sample, with LSTM, GRU, and EPRN scoring at 12.5 ms/sample, 10.8 ms/sample, and 7.8 ms/sample, respectively. The ‘inference speed’ of EPRN is equivalent to 128 frames per second, which is sufficient for sports analysis and shooting motion tracking at a 60 Hz frame rate. Other models didn’t perform as well as the EPRN, likely due to the lower number of parallel recurrent pathways and joint control, which simplified the calculations required compared to older recurrent models. This is why LSTM’s 2.1 M parameter count is dwarfed by the EPRN’s 1.9 M. Low-latency environments benefit from the EPRN due to the precision measured in its response time versus accuracy, as shown in Fig. 11.
Inference speed comparison across models.
Figure 11 shows inference speeds (ms/sample) for LSTM, GRU, CNN, db4, sym4, coif5, bior1.3, and EPRN across 100–1000 frames. The x-axis lists models, the y-axis shows sample sizes, and the z-axis indicates inference time (jet colormap: blue represents low values, red represents high values). A 30-ms contour marks the 30 Hz real-time threshold.
Per Table 3, EPRN’s 7.8 ms/sample at 100 frames (128 fps) is competitive with bior1.3 (7.2 ms) and sym4 (7.5 ms), outpacing LSTM (12.5 ms) and GRU (10.8 ms). At 1000 frames, EPRN scales to 8.6 ms, remaining below 30 ms, unlike LSTM, which scales to 13.8 ms. Low, blue surfaces for EPRN and wavelet models contrast with the red surfaces of LSTM and GRU. The 30-ms contour confirms the real-time feasibility of all models, with EPRN’s flat surface demonstrating scalability.
EPRN supports real-time applications, such as basketball shot analysis (60 Hz) or rehabilitation monitoring. Wavelet models (e.g., bior1.3) are slightly faster, but EPRN’s accuracy (23.5% RMSE reduction, Table 2) justifies its use. Optimizations “Interpretability analysis of EPRN”, such as pruning (6.5 ms per sample), could further reduce its surface area. Figure 11 validates EPRN’s real-time suitability, with inference speeds of less than 30 ms, supporting sports analytics and addressing computational concerns. Future work will refine the scaling process and test various optimizations.
Real-time applicability and optimization strategies
The EPRN framework is designed to meet the scope of real-time requirements, such as sports motion recognition for live performance monitoring and rehabilitation surveillance. As shown in Table 3; Fig. 11, EPRN achieves an inference time of 7.8 ms per sample, enabling processing at a rate of 128 frames per second. This performance is sufficient for analyzing athletic movements, such as soccer kick trajectory analysis or joint dynamics monitoring during running, which require sampling rates of 30–60 Hz. Nevertheless, the computational requirements for EPRN’s parallel recurrent structures and notion wavelet features necessitate optimizations for deployment in low-resource contexts, such as wearable or embedded systems.
Optimization strategies
-
1.
Hardware acceleration: To accelerate inference, EPRN can be utilized on specialized hardware, such as Field-Programmable Gate Arrays (FPGAs) or Tensor Processing Units (TPUs). FPGAs enable parallel processing customization, which can decrease inference latency by as much as 50% for recurrent neural networks, as some recent studies have suggested. Attention-based fusion, one of EPRN’s mechanisms, can achieve accelerated inference speeds on TPUs, which are optimized for matrix calculations, potentially reaching sub-5 ms inference times. A case in point is the deployment of EPRN on Google Coral TPUs, which would enable the real-time tracking of basketball shooting motions on edge devices.
-
2.
Model compression: Pruning and quantization techniques can minimize EPRN’s computational burden with minimal impact on precision. In LSTM and GRU models, pruning can suppress redundant pathways, resulting in a 30–40% reduction in the number of parameters. Quantization converts floating-point weights to 8-bit integers, reducing memory usage and enabling a 20–25% increase in inference speed. Initial tests of weight pruning on EPRN demonstrated a real-time inference time of 6.5 ms per sample, with an RMSE of 0.076, confirming that these techniques are suitable for use in time-sensitive sports applications.
-
3.
Efficient wavelet implementation: Utilizing the fast wavelet transform algorithms accelerates feature extraction by 15% and enables real-time processing of Sym4’s DWT computation. This is particularly advantageous for real-time applications, such as monitoring high-speed motion data for quick joint transitions during martial arts moves.
Practical implications
Implementing these optimization techniques enables EPRN to integrate with the rigorous latency constraints of real-time sports use cases. As an example, during an interactive basketball training session, feedback on shooting form could be given in real-time if an FPGA-accelerated EPRN processed motion trajectories faster than 5 ms. Similarly, during rehabilitation exercises, knee joint stability can be monitored using quantized EPRN on a wearable device, with real-time alerts sent to physiotherapists for detected anomalous movement patterns. EPRN’s inference speed is competitively sustainable relative to other baseline models using a combination of model compression and hardware acceleration (shown in Fig. 11). Focusing on these new areas of model optimization, field testing with FPGAs and TPUs, and researching adaptive wavelet selection for less computational strain will be the priority. Such efforts will enhance the integration of EPRN into advanced intelligent systems for sports analytics, facilitating its application within operational frameworks.
Interpretability analysis of EPRN
Understanding the decision-making process of the EPRN is crucial for establishing confidence in its sports motion recognition capabilities for real-time coaching and rehabilitation purposes. For this, we applied two explainability methods: attention weight visualizations and Gradient-weighted Class Activation Mapping (Grad-CAM)50. These methods show EPRN’s attention to time and space during feature extraction and selection for tasks, such as basketball shooting and running, revealing model explicability. This methodology is motivated by recent studies on multimodal frameworks where module-specific Grad-CAMs were used to explain the impact of components51.
Attention-weight visualizations clarify which frames within motion sequences contribute the most to predictions by depicting the temporal importance of features within EPRN’s attention-based fusion layer. Temporal Grad-CAM produces spatial heatmaps over skeletal joints that are critical for EPRN outputs. These methods were implemented for basketball shooting (a transient motion) and running (a cyclic motion) to illustrate the adaptability of EPRN. In Fig. 12, we present two visualizations of EPRN, which display (a) attention weight plots and (b) the corresponding Grad-CAM heatmaps.
Visualizations of EPRN: (a) attention weight plots and (b) the corresponding Grad-CAM heatmaps.
In (a), attention weights (y-axis: weight magnitude, x-axis: frame index) show pronounced maxima at critical instances of the action, for example, shot-release in basketball (high-frequency DWT coefficients, RMSE 0.075) and stride transitions in running (low-frequency DWT, RMSE 0.078). As for (b), the skeletal sequences are overlaid with heatmaps where the wrist, elbow, knee, and ankle joints are marked in red, highlighting the high activation during the basketball shooting and running actions, respectively. The texts indicate the corresponding types of motion (transient, cyclic).
The most important conclusion that can be drawn from the visualizations is that EPRN remains responsive to relevant spatiotemporal features, as evidenced by its relatively low RMSE results (Fig. 12). In basketball shooting, EPRN’s sharp shot-release precision is corroborated by high attention weights and Grad-CAM activation on upper body joints, illustrating its accuracy in detecting transient motions. In running, attention is focused on cyclic frames that are synchronized with the active lower-body joints. Unlike IntentFormer’s multimodal Grad-CAM, EPRN achieved comparable interpretability with a lower computational cost, due to its simpler design (1.9 M parameters, Table 3), which relies on DWT and attention fusion.
These findings strengthen confidence in EPRN’s forecast, endorsing its use in real-time functions such as performance evaluation, which require inference speeds of 7.8–8.6 ms per sample (Fig. 11). Determining the explainable accuracy impacts interpretation-guided model enhancement, directing focus on feature hierarchies for optimal covariance. Future research will utilize SHAP52 to assess the effects of specific features and improve clarity in EPRN’s multifaceted sports decision-making processes.
Adaptation to varying motion complexities across activity types
The EPRN explicitly addresses the diverse complexities of sports motions, ranging from the quick, transient movements involved in basketball shooting to the cyclic motion patterns of running. EPRN’s efficiency derives from three factors: (1) application of DWT for extraction of multi-scale features into high and low frequencies; (2) parallel LSTM and GRU pathways which are genetically optimized to capture temporal dependencies; and (3) a fusion layer based on attention mechanism that dynamically adjusts the importance of motion features due to their complexity. For example, EPRN achieves an RMSE of 0.075 for basketball shooting by prioritizing high DWT coefficients to capture sharp changes in joint angles, and 0.078 for running, while focusing on low-frequency cyclic motion patterns.
To measure adaptability, we examined EPRN’s crossover performance on basketball shooting, soccer kicking, running, and martial arts. Figure 13 illustrates the RMSE for these activities, along with distribution highlights, which demonstrate EPRN’s improvement over state-of-the-art systems to date53.
EPRN variation with complexity of motion in different activities.
As shown in Fig. 13, the distribution of RMSE values for EPRN is provided for four activities: basketball shooting (mean 0.075), soccer kicking (0.076), running (0.078), and martial arts (0.077). The activities are listed on the x-axis, and RMSE values are shown on the y-axis, along with relevant activity boxplots (e.g., basketball is represented by blue and running by green). Moreover, ‘motion type’ annotations (e.g., “Transient” for basketball, “Cyclic” for running) improve clarity along with a reference line at 0.08 RMSE, while optimized margins ensure no borders are clipped.
EPRN’s expansive adaptability is illustrated in the plot, as the median RMSE is below 0.08 for all activities, suggesting robust adaptability. The precision of DWT-based high-frequency attention fusion for transient motions is reflected in the basketball shooting’s tight distribution (mean 0.075 and narrow interquartile range). Running’s slightly elevated mean (0.078) suggests effective handling of cyclic patterns, utilizing low-frequency coefficients. The lower means observed for soccer kicking and martial arts (0.076 and 0.077, respectively) suggest consistency in mixed and transient motion performance.
EPRN’s architecture is simpler at 1.9 M parameters (Table 4) than gated and fuzzy logic-enhanced encoder frameworks, yet it still achieves similar adaptability and lower computational costs. EPRN’s estimation accuracy granularity complements RMSE (7.8–8.6 ms/sample, Fig. 11), as well as its real-time efficiency, as shown in Fig. 13.
The findings confirm EPRN’s motion recognition precision enables coaching performance evaluation and rehabilitation monitoring. Further research will focus on modeling fuzzy logic uncertainty, applying attention mechanisms to reduce RMSE bulge contours, and exploring optimizations using FPGA deployment, as discussed in “Interpretability analysis of EPRN”, for advanced intelligent sports analytics systems.
Benchmarking considerations and cross-model evaluation54
To address project-specific challenges, such as dataset heterogeneity, baseline model calibration, and metric relevance, we expanded the scope of our evaluation to other datasets, thereby ensuring the robustness and generalizability of the proposed EPRN framework.
All baseline models were tuned using identical validation splits and hyperparameter ranges. A comparative analysis of EPRN, LSTM, GRU, CNN-LSTM, and Transformer models across the NTU RGB + D, Human3.6 M, CMU Mocap, and UCF Sports datasets is presented in Table 6. The evaluation encompasses multifaceted considerations, including MSE, RMSE, SSIM, MAE, and classification-based accuracy, where applicable. These human motion capture datasets cumulatively gauge the performance of the models against the threats posed by their motion sequence modeling and prediction capabilities. In every dataset except the UCF Sports dataset, where EPRN achieved the highest accuracy, EPRN outperformed the other models by repeatedly demonstrating lower error rates, higher structural similarity, and advanced classification accuracy. As evident from the results, EPRN exhibits superior robustness and generalization compared to benchmarks SR- and HR-REC, utilizing traditional recurrent and hybrid architectures.
The comprehensive benchmarking further reinforces that EPRN performs well on both trajectory reconstruction and action classification tasks. On Human3.6 M and NTU RGB + D, EPRN distinctly excelled in both error reduction and structural similarity measures, demonstrating strong performance across 3D datasets with dense spatial resolution. Moreover, EPRN with softmax-based prediction heads also achieved the highest classifier performance on UCF Sports, a benchmark for video classification, with 89.3% accuracy, surpassing the performance of Transformer-based models.
These findings also demonstrate that EPRN does not appear to overfit to a single dataset, illustrating its adaptability across tasks and modalities with appropriate tuning and adjustments to the output layer. Although our work to date has focused mainly on regression and reconstruction, these results invite a deeper investigation of action classification, multimodal fusion, and domain adaptation using the EPRN architecture.
Ablation study
To extract and measure the impact of each part within the EPRN structure, an extensive ablation study was performed to analyze the impact of each part. The focus was to assess the effects of each part—attention-based fusion, parallel recurrent, and wavelet convolutional neural networks—on the model’s performance for motion trajectory reconstruction.
We explored the following configurations:
-
Baseline GRU-only: Single-stream GRU model.
-
Baseline LSTM-only: A standard single-stream LSTM network with equivalent hidden units and training setup.
-
EPRN (no attention fusion): Parallel LSTM-GRU model with simple concatenation of outputs instead of learned attention fusion.
-
EPRN (no DWT): Full EPRN architecture using raw joint coordinates without wavelet features.
-
EPRN (full): Full model with DWT preprocessing, parallel LSTM-GRU branches, and attention-based fusion.
The evaluation for all models was conducted on the CMU Motion Capture Dataset, after each model was trained under a standard training procedure with optimized hyperparameters identified through a Bayesian search, as shown in Table 7.
The findings emphasize that each of the derived frameworks of the model has been integrated to optimize performance features. With the application of the DWT, feature quality is improved, resulting in a decrease of approximately 16% in the RMSE compared to models using raw data. Incorporating parallel LSTM and GRU branches with a simple concatenation improves performance when compared to single-branch structures. Still, it is less effective than the adaptability gained through the application of attention mechanisms. Adding an attention-based fusion module that computes the contextual importance of dense and temporal encodings further increases accuracy by leveraging contextual information. Cumulatively, these results corroborate the assertion that the complete EPRN architecture is critical for high-fidelity modeling of complex motion sequences, in contrast to disassociating elements and studying them individually. The ablation study further supports these insights as evidence, highlighting the crucial role each component plays in achieving a groundbreaking performance.
Comparison of wavelet transforms
To analyze the effectiveness of wavelet-based feature extraction, we compared it with two popular deep learning techniques: feature extractors based on CNNs and autoencoders (AEs). The comparison was made with a constant EPRN structure as the downstream model to isolate performance differences purely due to the feature extraction approach employed. The outcomes are consolidated in Table 8.
These results suggest that wavelet-based features have a distinct advantage in representing motion due to their multi-resolution temporal characteristic preservation capabilities alongside noise suppression without requiring heavy dimensionality reduction. Far different, CNNs tend to capture spatial attributes at the expense of fine-grained temporal detail. While autoencoders compress more effectively, they are burdened with overfitting and reconstruction noise on high-variance motion sequences.
The results support the conclusion that wavelet transforms indeed possess a unique capability to represent and compress sports motion data in a computationally efficient manner, while integrating the underlying semantics at a dense level, which meets the requirements of subsequent deep recurrent structures.
Limitations and future research directions
The EPRN performs excellently in recognizing sports motion, with some limitations acknowledged to enhance its real-world applicability on the field. Whatever barriers are encountered will provide insight into improving generalizability, efficiency, and scalability in real-time motion analysis and broader implementation. Below, we outline the significant limitations of EPRN and propose research directions to address them.
It is highly appropriate to collect the standard and high-quality motion capture data for this study within the laboratory environment using a sophisticated motion-capturing system. Such high-density datasets give assurance on accuracy in motion recognition. On the contrary, they establish model limitations to generalization based on objective evidence, i.e., data from noisy or low-resolution sources, such as video-based motion recognition and data from wearable sensors. Performance can be significantly compromised by factors such as occlusions, background noise, lighting variations, and sensor drift. Future research into robustness should incorporate domain adaptation techniques, such as unsupervised transfer learning, adversarial training, and synthetic data augmentation, to enable the model to adapt to diverse environments. This will allow for its deployment in real-world settings without requiring further refinement. Integrated self-supervised learning frameworks will also reduce the reliance on labeled datasets for model training by leveraging pretraining and feature refinement on unlabeled motion data.
Another limitation of using EPRN is its higher computational cost compared to lightweight frameworks such as GRU and TCN. EPRN prides itself on striking a balance between accuracy and efficiency. However, the intricacies of its parallel recurrent structure and the wavelet-based feature extraction add to the processing overhead, thus increasing the training and inference time, and consequently reducing its deployability in real-time application systems with insufficient resources or in embedded systems. Possible Future Directions: Research on model compression methodologies, such as pruning, quantization, and knowledge distillation, which can keep model accuracy intact while reducing the intensive computational burden. Hardware acceleration techniques, such as Field Programmable Gate Arrays (FPGAs) or Tensor Processing Units (TPUs), can significantly enhance performance in real-time, making EPRN suitable for sports analytics and rehabilitation in low-latency applications.
Another benchmark, which also ranks among the limitations of deep learning models in general and those of EPRN in particular, is the lack of explainability and interpretability. In critical applications such as athlete performance tracking, injury prevention, and rehabilitation, understanding the factors that contributed to a model outcome is vital for trust and transparency among practitioners. However, deep learning architectures are often referred to as black boxes and do not provide much insight into how their predictions are made. This makes acceptance of EPRN into clinical and sports science difficult, for which explainability is at least as important as accuracy. Therefore, future work will focus on embedding Explainable AI (XAI) techniques55, such as LRP (Layer-wise Relevance Propagation)56, SHAP (Shapley Additive Explanations)57, and Grad-CAM (Gradient-weighted Class Activation Mapping)58, which visualize the motion features predominantly contributing to model predictions to enhance transparency. Hybridization involving symbolic reasoning will yet again contribute positively to interpretability without compromising the predictive performance of model architectures.
Another line of possible future investigation is generalizing EPRN to other and more complex sports movements. The present study delves deeply into human locomotion, specifically walking and running, which are highly structured forms of movement. A variety of dynamic sports activities characterized by high degrees of freedom-think gymnastics, martial arts and multi-agent team sports-bring further issues for motion modeling, usually requiring a quick change of body positioning, multi-joint coordination, and complex biomechanical interactions which, to say the least, recurrent architectures in their traditional forms may not account for well. Future research could enhance the applicability of EPRN to complex sports movements by introducing Graph Neural Networks, which are well-suited for modeling relationships among various body joints. Simultaneously, incorporating transformer-based temporal attention mechanisms will also work well, ensuring the model’s capability to govern long-range dependencies and complex interactions among multiple movements, especially in multiple-actor scenarios that rapidly and dynamically change in nature, such as in sports.
Conclusion and future work
However, this study still has some shortcomings. Firstly, although EPRN claims to operate in real-time, its performance significantly declines for extremely high-dimensional motion data. Apart from this, EPRN doesn’t require high-quality studio data, but only that it can meet a requirement, which, in practical environments, cannot be easily fulfilled because real-time signal registrations from highly noisy and sometimes incomplete signals cannot be modeled well. Besides these, wavelet transforms indeed improve feature extraction; however, deciding on which works best as an optimal wavelet function remains an open area for research, potentially affecting the applicability of such models to different sports and motion patterns.
Some promising future work can be accorded to the improvement of the proposed framework in various areas. One approach is to improve the quality of features and minimize dependency on labeled datasets by integrating self-supervised and contrastive learning approaches. Such data is usually costly to generate. Second, a multimodal fusion of video streams, inertial sensor data, and biomechanical markers could enhance the model’s understanding of complex movements. Third, the development of light and efficient variants of EPRN optimized for edge computing and wearable devices enables real-time motion tracking in sports analytics and rehabilitation. Furthermore, dynamic refinement of predictions from error patterns may be achieved by exploring more adaptive residual correction mechanisms to enhance accuracy and robustness.
The actual applicability of EPRN and wavelet-based motion analysis is demonstrated in the context of woodwork, specifically in sports performance enhancement, injury prevention, and athlete monitoring. With the high-precision, generalizable structure emerging from this research, a solid groundwork is being laid for future efficiency and high-speed advancements in deep learning-based motion recognition. The pushing of boundaries in intelligent systems of motion analysis must proceed in earnest strides, particularly in the development of hybrid architectures and adaptive learning methodologies.
Data availability
The datasets analyzed during the current study are available from the following link: https://mocap.cs.cmu.edu.
Code availability
The code used to generate the results reported in this study is publicly available at https://www.mathworks.com/matlabcentral/fileexchange/181833-deep-learning-for-sports-motion-recognition. Researchers can access, review, and reuse the code to ensure transparency and reproducibility of the experiments.
References
Sharma, N., Dhiman, C. & Indu, S. Pedestrian intention prediction for autonomous vehicles: a comprehensive survey. Neurocomputing 508, 120–152 (2022).
Alomar, K., Aysel, H. I. & Cai, X. RNNs, CNNs and transformers in human action recognition: a survey and a hybrid model. arXiv preprint arXiv:2407.06162, (2024).
Liu, J., Liu, S., Medhat, M. E. & Elsayed, A. M. M. Wavelet transform theory: the mathematical principles of wavelet transform in gamma spectroscopy. Radiat. Phys. Chem. 203, 110592 (2023).
Akujuobi, C. M. Wavelets and Wavelet Transform Systems and their Applications (Springer, 2022).
Pal, A. R. & Singha, A. A comparative analysis of visual and thermal face image fusion based on different wavelet family, in International Conference on Innovations in Electronics, Signal Processing and Communication (IESC), IEEE, 2017, pp. 213–218. (2017).
Liu, Y., Sathishkumar, V. E. & Manickam, A. Augmented reality technology based on school physical education training. Comput. Electr. Eng. 99, 107807 (2022).
Le, V. T., Tran-Trung, K. & Hoang, V. T. A comprehensive review of recent deep learning techniques for human activity recognition. Comput. Intell. Neurosci. 2022(1), 8323962 (2022).
Pandey, A. K. & Parihar, A. S. A comparative analysis of deep learning based human action recognition algorithms, in 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), IEEE, 2023, pp. 1–7. (2023).
Aksan, E., Kaufmann, M., Cao, P. & Hilliges, O. A spatio-temporal transformer for 3d human motion prediction, in International Conference on 3D Vision (3DV), IEEE, 2021, pp. 565–574. (2021).
Zhang, J., Ling, C. & Li, S. EMG signals based human action recognition via deep belief networks. IFAC-PapersOnLine 52 (19), 271–276 (2019).
Martini, E. et al. Pressure-sensitive insoles for real-time gait-related applications. Sensors 20 (5), 1448 (2020).
Mekruksavanich S, Jitpattanakul A. Multimodal wearable sensing for sport-related activity recognition using deep learning networks. Jour. of Adv. in Infor. Techn. 13(2), 132–138. https://doi.org/10.12720/jait.13.2.132-138 (2022).
Chen, C. Y., Wang, J. C., Wang, J. F. & Hu, Y. H. Event-based segmentation of sports video using motion entropy, in Ninth IEEE International Symposium on Multimedia (ISM), IEEE, 2007, pp. 107–111. (2007).
Li, M., Zhang, M., Luo, X. & Yang, J. Combined long short-term memory-based network employing wavelet coefficients for MI-EEG recognition, in IEEE International Conference on Mechatronics and Automation, IEEE, 2016, pp. 1971–1976. (2016).
Chao ZH, Ya Long Y, Yi L, Min L. Deep Q learning-enabled training and health monitoring of basketball players using IoT integrated multidisciplinary techniques. Mob. Net. and Appli. 1–16. https://doi.org/10.1007/s11036-024-02376-y (2024).
Habibi, M., Nourani, M. & Sullivan, D. H. An AI-driven camera-based platform for patient ambulation assessment, in 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2024, pp. 1–4. (2024).
Parekh, V., Shah, D. & Shah, M. Fatigue detection using artificial intelligence framework. Augmented Hum. Res. 5 (1), 5 (2020).
Munoz-Macho, A. A., Domínguez-Morales, M. J. & Sevillano-Ramos, J. L. Performance and healthcare analysis in elite sports teams using artificial intelligence: a scoping review. Front. Sports Act. Living. 6, 1383723 (2024).
Psaltis, A., Patrikakis, C. Z. & Daras, P. Deep multi-modal representation schemes for federated 3d human action recognition, in European Conference on Computer Vision, Springer, pp. 334–352. (2022).
Zhou, Q. et al. Cross-modal learning with multi-modal model for video action recognition based on adaptive weight training. Conn Sci. 36 (1), 2325474 (2024).
He, J. & Pao, H. K. Multi-modal, multi-labeled sports highlight extraction, in 2020 international conference on Technologies and Applications of Artificial Intelligence (TAAI), IEEE, pp. 181–186. (2020).
Duan, C., Hu, B., Liu, W. & Song, J. Motion capture for sporting events based on graph convolutional neural networks and single target pose estimation algorithms. Appl. Sci. 13 (13), 7611 (2023).
Rueckert, D. & Schnabel, J. A. Model-based and data-driven strategies in medical image computing. Proc. IEEE. 108(1), 110–124 (2019).
Naseralavi, S. S., Balaghi, S. & Khojastehfar, E. Effects of various wavelet transforms in dynamic analysis of structures, World Academy of Science, Engineering and Technology, International Journal of Civil, Environmental, Structural, Construction and Architectural Engineering, vol. 10, no. 7, (2016).
Pratama, M. & Wang, D. Deep stacked stochastic configuration networks for lifelong learning of non-stationary data streams. Inf. Sci. (N Y). 495, 150–174 (2019).
Barbhuiya, A. A., Karsh, R. K. & Jain, R. CNN based feature extraction and classification for sign language. Multimed Tools Appl. 80 (2), 3051–3069 (2021).
Xiao G, Cao Y, Huang J, Jin X, Zhang Y. Knowledge graph metric learning network for few-shot health status assessment. IEEE Sens. Jour. https://doi.org/10.1109/JSEN.2024.3507096 (2024).
Yasin, H., Ghani, S. & Krüger, B. An effective and efficient approach for 3d recovery of human motion capture data. Sensors 23 (7), 3664 (2023).
Al-Taee, A. A., Khushaba, R. N., Zia, T. & Al-Jumaily, A. Feature extraction using wavelet scattering transform coefficients for emg pattern classification, in Australasian Joint Conference on Artificial Intelligence, Springer, pp. 181–189. (2022).
Alomar, K., Aysel, H. I. & Cai, X. Data augmentation in classification and segmentation: a survey and new strategies. J. Imaging. 9 (2), 46 (2023).
Claessens, B. J., Vrancx, P. & Ruelens, F. Convolutional neural networks for automatic state-time feature extraction in reinforcement learning applied to residential load control. IEEE Trans. Smart Grid. 9 (4), 3259–3269 (2016).
Lema-Condo, E. L., Bueno-Palomeque, F. L., Castro-Villalobos, S. E., Ordonez-Morales, E. F. & Serpa-Andrade, L. J. Comparison of wavelet transform symlets (2–10) and daubechies (2–10) for an electroencephalographic signal analysis, in IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), IEEE, 2017, pp. 1–4. (2017).
Mahmud, N., MacGillivray, A., Chaudhary, M. & El-Araby, E. Decoherence-optimized circuits for multidimensional and multilevel-decomposable quantum wavelet transform. IEEE Internet Comput. 26 (1), 15–25 (2021).
Dey, R. & Salem, F. M. Gate-variants of gated recurrent unit (GRU) neural networks, in IEEE 60th international midwest symposium on circuits and systems (MWSCAS), IEEE, 2017, pp. 1597–1600 (2017).
Guo, J., Wang, W., Tang, Y., Zhang, Y. & Zhuge, H. A CNN-Bi_LSTM parallel network approach for train travel time prediction. Knowl. Based Syst. 256, 109796 (2022).
Han, M. E-Bayesian estimations of parameter and its evaluation standard: E-MSE (expected mean square error) under different loss functions. Commun. Stat.-Simul. Comput. 50 (7), 1971–1988 (2021).
Yadav, M. & Alam, M. A. Dynamic time warping (dtw) algorithm in speech: a review. Int. J. Res. Electron. Comput. Eng. 6 (1), 524–528 (2018).
Bakurov, I., Buzzelli, M., Schettini, R., Castelli, M. & Vanneschi, L. Structural similarity index (SSIM) revisited: a data-driven approach. Expert Syst. Appl. 189, 116087 (2022).
Wang, T. et al. ResLNet: deep residual LSTM network with longer input for action recognition. Front. Comput. Sci. 16 (6), 166334 (2022).
Luo, W., Liu, W. & Gao, S. Remembering history with convolutional lstm for anomaly detection, in IEEE International conference on multimedia and expo (ICME), IEEE, 2017, pp. 439–444 (2017).
Liu, W., Lin, Y., Liu, W., Yu, Y. & Li, J. An attention-based multiscale transformer network for remote sensing image change detection. ISPRS J. Photogramm. Remote Sens. 202, 599–609 (2023).
Pandey, A. & Wang, D. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain, in ICASSP–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 6875–6879 (2019).
Tabachnick, B. G. & Fidell, L. S. Experimental Designs Using ANOVAvol. 724 (Thomson/Brooks/Cole Belmont, 2007).
Guo, T. et al. A review of wavelet analysis and its applications: challenges and opportunities. IEEe Access. 10, 58869–58903 (2022).
Laumanns, M. & Ocenasek, J. Bayesian optimization algorithms for multi-objective optimization, in International Conference on Parallel Problem Solving from Nature, Springer, pp. 298–307. (2002).
Bischl, B. et al. Hyperparameter optimization: foundations, algorithms, best practices, and open challenges. Wiley Interdiscip Rev. Data Min. Knowl. Discov. 13 (2), e1484 (2023).
Malakouti SM, Menhaj MB, Suratgar AA. Applying grid search, random search, Bayesian optimization, genetic algorithm, and particle swarm optimization to fine-tune the hyperparameters of the ensemble of ML models enhances its predictive accuracy for mud loss. Res. Squ. https://doi.org/10.21203/rs.3.rs-5187887/v1 (2024).
González-Estrada, E. & Cosmes, W. Shapiro–Wilk test for skew normal distributions based on data transformations. J. Stat. Comput. Simul. 89 (17), 3258–3272 (2019).
Berger VW, Zhou Y. Kolmogorov–Smirnov test: overview. Wiley StatsRef: Statistics Reference Online. https://doi.org/10.1002/9781118445112.stat06558 (2014).
Bhakte, A., Vasista, B. S. & Srinivasan, R. Gradient-Weighted Class Activation Mapping (Grad-CAM) based explanations for process monitoring results from deep neural networks, in 2021 AIChE Annual Meeting, AIChE, (2021).
Sharma, N., Dhiman, C. & Indu, S. Predicting pedestrian intentions with multimodal intentformer: a co-learning approach. Pattern Recogn. 161, 111205 (2025).
Mosca, E., Szigeti, F., Tragianni, S., Gallagher, D. & Groh, G. SHAP-based explanation methods: a review for NLP interpretability, in Proceedings of the 29th international conference on computational linguistics, pp. 4593–4603. (2022).
Sharma N, Dhiman C, Indu S. Progressive contextual trajectory prediction with adaptive gating and fuzzy logic integration. IEEE Trans. on Intel. Vehi. 9(11), 6960–6970. https://doi.org/10.1109/TIV.2024.3391898 (2024).
Chatterjee A, et al. (2025) A comprehensive cross-model framework for benchmarking the performance of quantum Hamiltonian simulations. IEEE Trans. on Quan. Engin. 6, 1–26. https://doi.org/10.1109/TQE.2025.3558090
Boselli, R., D’Amico, S. & Nobani, N. eXplainable AI for word embeddings: a survey. Cognit Comput. 17(1), 1–24 (2025).
Montavon, G., Binder, A., Lapuschkin, S., Samek, W. & Müller, K. R. Layer-wise relevance propagation: an overview, Explainable AI: interpreting, explaining and visualizing deep learning, pp. 193–209, (2019).
Yang, C. et al. How can SHAP (SHapley Additive exPlanations) interpretations improve deep learning based urban cellular automata model? Comput. Environ. Urban Syst. 111, 102133 (2024).
Quach, L. D., Quoc, K. N., Quynh, A. N., Thai-Nghe, N. & Nguyen, T. G. Explainable deep learning models with gradient-weighted class activation mapping for smart agriculture. IEEE Access. 11, 83752–83762 (2023).
Funding
Scientific research project of higher education institutions in Anhui province in 2024. Literature collation and research of wushu quanzhong in Anhui Province. Approval No. 2024AH053380, Project category scientific research project-key project.
Author information
Authors and Affiliations
Contributions
Yang Yang and Fallah Mohammadzadeh wrote the main manuscript text, Mohammad Khishe prepared Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10, and Ahmed Najat Ahmed, Mosleh M. Abualhaj, and Taher M. Ghazal contributed to data analysis and experimental validation. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, Y., Mohammadzadeh, F., Khishe, M. et al. Deep learning for sports motion recognition with a high-precision framework for performance enhancement. Sci Rep 15, 38861 (2025). https://doi.org/10.1038/s41598-025-22701-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-22701-z












