Upper limb human-exoskeleton system motion state classification based on semg: application of CNN-BiLSTM-attention model

Zhao, Dongwei; Ye, Xiangming; Wang, Song; Zhang, Chenfeng; Sun, Shouqian; Zhang, Xuequn; Cheng, Ruidong

doi:10.1038/s41598-025-02864-5

Download PDF

Article
Open access
Published: 30 May 2025

Upper limb human-exoskeleton system motion state classification based on semg: application of CNN-BiLSTM-attention model

Dongwei Zhao¹,
Xiangming Ye²,
Song Wang³,
Chenfeng Zhang¹,
Shouqian Sun¹,
Xuequn Zhang⁴ &
…
Ruidong Cheng²

Scientific Reports volume 15, Article number: 18969 (2025) Cite this article

3016 Accesses
Metrics details

Subjects

Abstract

This study aims to classify five typical motion states of the human upper limb based on surface electromyography signals, thereby supporting the real-time control system of an assistive upper limb exoskeleton. We propose a deep learning model combining convolutional neural networks, bidirectional long short-term memory networks, and attention mechanism to enhance the accuracy of motion state recognition in complex scenarios. Surface electromyography data were collected from ten participants for the biceps, triceps, and deltoid muscles, covering five representative states: resting, mild activity, rapid movement, dynamic load-bearing, and static load-bearing. Following the systematic fusion of multi-domain features spanning time, morphological, frequency, and cepstral characteristics, temporal features were structured through sliding window segmentation to serve as inputs for the proposed model. The proposed model achieved a classification accuracy of 97.29% on the test set, with an average accuracy of 88.17 ± 5.39% under leave-one-subject-out cross-validation, outperforming baseline algorithms. These findings highlight the model’s potential in motion state classification, facilitating advanced, intelligent control capabilities of human-exoskeleton systems.

Intelligent upper-limb exoskeleton integrated with soft bioelectronics and deep learning for intention-driven augmentation

Article Open access 10 February 2024

Empowering stroke recovery with upper limb rehabilitation monitoring using TinyML based heterogeneous classifiers

Article Open access 24 May 2025

Surface electromyography evaluation for decoding hand motor intent in children with congenital upper limb deficiency

Article Open access 30 December 2024

Introduction

Exoskeletons have garnered significant attention in recent years due to their broad applications in rehabilitation, industrial, and military domains^1,2,3,4. In industrial applications⁵, exoskeleton devices are designed to reduce workers’ fatigue, prevent injuries, and increase productivity. By enhancing human strength and endurance, these devices enable workers to perform high-demand physical tasks for extended periods. However, accurately classifying and responding to a diverse range of human motion states in real time remains a significant challenge for assistive exoskeletons⁶. For industrial exoskeletons, it is essential to accurately identify various movement states, including types of motion and load conditions. These systems must dynamically respond to different actions and provide appropriate assistance based on real-time motion classification.

Despite progress, the development of exoskeleton systems, especially for industrial use, has not fully addressed these complex challenges. Human actions are highly variable under actual conditions; the same movement may exhibit substantial differences across individuals and load conditions. Existing algorithms struggle to manage this variability effectively. Consequently, developing algorithms that can robustly extract motion features and adapt to different tasks and load conditions has become a pressing challenge in the field of assistive exoskeletons.

Surface electromyography (sEMG)-based methods offer significant advantages in predicting motion intent, making them particularly suitable for real-world applications. sEMG signals generated 30 to 150 ms before actual human movement provide direct insight into user intent before visible motion occurs^7,8; this capability allows for predictive, real-time responses that can enhance the precision of exoskeleton control. Additionally, sEMG can monitor muscle fatigue, further demonstrating the assistive effects of exoskeleton devices^9,10,11. Given its rich information on muscle activity and ease of acquisition, sEMG is widely used in human–machine interaction systems, making it an effective tool for predicting motion intent and supporting effective control in assistive technologies.

Previous studies have explored various human motion state classification methods. Early sEMG-based upper limb motion classification methods typically relied on traditional classifiers. These methods generally involve collecting sEMG signals, pre-processing, manual feature extraction in both time and frequency domains, training models with extracted features, and classifying input data¹². For example, gaussian mixture models were used in¹³ to classify six upper limb movements, while¹⁴ utilized a linear programming boosting algorithm to classify seven upper limb actions. Reference¹⁵ applied a logistic polynomial regression approach to classify dynamic lifting tasks across three load conditions, achieving over 80% accuracy. In¹⁶, a cubic support vector machine (SVM) model was employed for similar lifting task classification under varying loads, attaining 99% accuracy. However, traditional machine learning algorithms lack the capability to adapt in real-time to user intent and cannot effectively capture the complex features of sEMG signals^17,18.

The recent introduction of deep learning has provided new approaches for sEMG signal classification, as sEMG signals typically contain abundant high- and low-frequency information, exhibiting complex spatial patterns and local features. convolutional neural network (CNN), known for capturing spatial structures, have been widely applied in sEMG signal processing^19,20,21. For example, in ²², a combination of deformable convolutional neural networks (DCNN) and magnitude-based short-time Fourier transform achieved an accuracy of 82.03% in classifying six basic arm movements (flexion, extension, abduction, adduction, pronation, and supination).

Long short-term memory network (LSTM), known for capturing long-term dependencies, are well-suited to handle sEMG signals²³, which often have strong temporal characteristics and complex dynamic changes within short periods. Through memory cells, LSTMs effectively capture and retain these long-term temporal dependencies. For instance, ⁶ combined a CNN with an LSTM to classify four shoulder movements of subjects wearing an exoskeleton, achieving an average accuracy of 96.2%. In ²⁴ a six-axis inertial sensor combined with an LSTM model was used to recognize activities and estimate loads for subjects wearing an exoskeleton, achieving 90.80% accuracy in activity recognition and 87.14% in load estimation.

Attention-based LSTM was initially proposed in ²⁵ to address relation classification in natural language processing. Given that sEMG signals exhibit temporal dependencies and local feature variability, the importance of features at different time points varies for the final classification outcome. Thus, attention-based LSTM is highly suitable for processing sEMG signals^10,26,27. In the face of complex motion patterns or varying load conditions, the attention mechanism helps models focus on the most representative moments, enhancing adaptability across different motion states.

These studies demonstrate the significant research value and application potential of deep learning methods incorporating sEMG data for exoskeleton applications. However, existing studies lack a comprehensive classification of motion states and loads, and most do not consider the effects of wearing an exoskeleton.

In this work, we propose an sEMG-based solution that leverages a convolutional Bidirectional LSTM (BiLSTM) model with an attention mechanism for human activity recognition, achieving classification across five typical human motions with an accuracy of 97.29%.

This research focuses on an AI-assisted approach to enhance human motion intent recognition during the use of upper limb exoskeletons. The paper primarily addresses the perceptual aspect of recognizing motion intent while wearing an exoskeleton. The main contributions are: (a) classification of common upper limb motion states (covering motion types and loads) while wearing an exoskeleton, especially the rarely discussed Static Load-Bearing State (SLB)—where the person is stationary while carrying a load; and (b) implementation of a convolutional BiLSTM model with an Attention mechanism, yielding promising classification results.

Methods

Participants

A total of 10 participants were recruited for the experiment, aged 24–42 years (M = 27.70, S.D. = 5.57), with heights ranging from 1.68 to 1.82 m (M = 1.75 m, S.D. = 0.04 m) and weights from 65 to 90 kg (M = 78.70 kg, S.D. = 7.52 kg). All participants were free from any conditions affecting the experiment and refrained from engaging in strenuous physical activity the day before testing. All participants provided written informed consent before participation. The study which was in accordance with the principles and guidelines described in the Declaration of Helsinki and was approved by the Ethics Committee of Zhejiang Provincial People’s Hospital (KY2024220). They also signed informed written consent forms for the publication of any identifying information or images in an online open-access publication.

Apparatus

We used a self-designed exoskeleton as the experimental platform, shown in Fig. 1. This exoskeleton is powered by UniTree A1 motors (Hangzhou Unitree Technology Co., LTD, China), which enables zero-torque mode driving to minimize friction within the motor components.

Following the experimental motion design from ²⁸, we designed a device to measure maximum voluntary isometric contraction (MVIC), as shown in Fig. 2. This device facilitates MVIC experiments for primary muscle groups, such as the shoulder and hip joints.

The MVIC test rig was placed in a motion capture room equipped with 16 cameras and LED light bands to capture participants from multiple angles, helping to standardize participant movements. In Fig. 2, A represents the camera, B denotes the LED light band, C is the data acquisition laptop, D₁-D₅ are the load sensors (SBT710, SIMBATOUCH INC, China), and E is the digital transducer (SBT904D, SIMBATOUCH INC, China) with a power supply module. The platform provides force measurements with an accuracy of 0.01N across various movements. We utilized an sEMG acquisition device from Sichiray Technology Co., Ltd., China, which has a sampling rate of 200 Hz.

Testing procedures

First, each participant completed an MVIC test using the MVIC test platform. Then, sEMG electrodes were placed on the middle deltoid(DT), biceps brachii(BB), and triceps brachii(TB) according to the SENIAM (surface EMG for a non-invasive assessment of muscles) guidelines²⁹, as shown in Fig. 5. Afterwards, participants wore the exoskeleton and performed the following tasks sequentially, representing five states: Resting State (RS)—a resting or non-movement state; Mild Activity State (MA)—mild movement state; Rapid Movement State (RM)—rapid limb movement; Dynamic Load-Bearing State (DLB)—dynamic load-bearing; SLB—static heavy load-bearing, as shown in Fig. 3.

In this study, the movement patterns were designed as follows: MA involved normal arm swinging during walking, while RM was designed as a rapid shoulder joint extension of 90 degrees within one second. To simulate a static load-bearing state, SLB was defined as the subject holding a 25 kg dumbbell with maximum effort for 5 s without movement. DLB required the subject to perform a front raise using a 7.5 kg dumbbell, with duration dependent on the subject’s adaptation to the motion. During the experiment, each subject rested for 5 min between actions and took a 30-min break between sets to prevent fatigue from affecting data quality. sEMG signals were recorded using an sEMG collection module. After data processing, the total recording time for each type of action was standardized to 30 s.

Data analysis

Data processing and acquisition

The original signals collected in this paper are shown in Table 1:

Table 1 The original data collected by the experiment.

Full size table

The original sEMG data contains many deviation values, and directly feeding these signals into the network increases the complexity of model training. Furthermore, due to the limited dataset scale (N = 10) of the constructed dataset, sEMG features from a single subject may disproportionately influence the model. To address these challenges, we implemented systematic feature engineering to extract comprehensive time-domain, frequency-domain, and morphological features from the signal, as illustrated in Table 2. These features have been shown in multiple studies^16,30 to be effective in improving the performance of sEMG classification models. By utilizing these universal features, we aimed to capture the inherent characteristics of sEMG signals that remain consistent across different subjects, thereby improving the model’s ability to recognize motion states regardless of individual physiological variations.

Table 2 The original data collected by the experiment.

Full size table

First, we extracted time-domain features to characterize the amplitude characteristics of sEMG signals. The root mean square (RMS) represents the average signal energy, reflecting muscle activation levels, while peak-to-peak (P-P) value indicates the range of muscle contraction intensity.

Next, we performed frequency-domain analysis to capture the spectral properties of sEMG signals. Mean frequency (MNF) and median frequency (MDF) were extracted, revealing muscle fiber recruitment patterns and potential fatigue indicators, which are crucial for identifying variations in motor unit action potentials across different motion states.

Additionally, we calculated shape factor (SF) to provide waveform morphological information, and root sum of squares (RSS) to offer a comprehensive measurement of signal intensity that emphasizes peaks more than RMS, making it particularly suitable for detecting brief but intense muscle activations during specific upper limb movements.

To capture subtle spectral characteristics, we employed Mel-frequency cepstral coefficients (MFCCs), specifically utilizing the first and third coefficients (MFCC1 and MFCC3) to detect nuanced frequency distribution patterns in sEMG signals associated with different motion states.

These eight features were calculated for each sEMG channel, creating a 24-dimensional feature vector. This multi-domain feature set serves as a comprehensive representation of sEMG signals, establishing a solid foundation for motion state analysis.

Classification model

Perceiving exoskeleton users’ upper limb motion patterns is a critical issue in human-in-the-loop exoskeleton systems. However, many exoskeletons, including most passive exoskeletons and some active exoskeletons, operate based on predefined motion patterns and trajectories. This approach places humans in a passive role within the human-exoskeleton system, resulting in poor human–machine interaction. To address this issue, this study proposes a CNN-BiLSTM-Attention model for motion intention classification, as shown in Fig. 5. The model leverages the local temporal features extraction capabilities of CNN, the long-term temporal dependencies processing abilities of BiLSTM, and the attention mechanism’s focus on critical information to achieve improved motion classification performance.

To process the collected sEMG data, an overlapping sliding window mechanism was first applied, with a window length of 500 ms and an overlap of 450 ms. This approach enhances the inclusion of short-term dynamic information. Due to the sequential nature of sEMG data, information between adjacent time points changes rapidly. Using overlapping windows allows the model to capture more detailed variations while reducing the loss of sEMG information at the window boundaries. Subsequently, the sEMG signal is processed into multiple 100 × 3 sEMG signal matrices, which were then subjected to systematic feature engineering to extract multi-domain discriminative patterns. Then a 1D CNN layer is applied for local temporal features extraction.

For temporal feature extraction in sEMG data, we considered recurrent neural network (RNN)³¹. However, RNNs can suffer from gradient vanishing or exploding issues during the training process for sequential data. LSTM can partially address these issues³². Figure 4 illustrates the data processing flow within an LSTM cell, which forms the core component of LSTM layers. The LSTM layer’s states include the hidden state ${h}_{t}$ which serves as the LSTM layer’s output at the current timestep, and the cell state ${c}_{t}$, responsible for carrying information across time steps. Information regulation is achieved through gates, which selectively allow specific information to pass. LSTM includes three types of gates (illustrated by the dashed boxes in Fig. 4): the forget gate, the input gate, and the output gate. These gates are responsible for discarding irrelevant information, updating the LSTM cell state, and determining which information is included in the LSTM output state. Additionally, the candidate cell state ${\widetilde{C}}_{t}$ processed through a tanh layer, combines with the output of the input gate ${i}_{t}$ to determine cell state updates.

The parameters of the LSTM network include the input weights $W$, recurrent weights $Q$ and bias $b$. Each gate and candidate cell state is computed as follows:

$${f}_{t}=\sigma \left({W}_{f}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{f}\right)$$

(1)

$${i}_{t}=\sigma \left({W}_{i}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{i}\right)$$

(2)

$${o}_{t}=\sigma \left({W}_{o}\left[{h}_{t-1},{x}_{t}\right]+{b}_{o}\right)$$

(3)

$${\widetilde{C}}_{t}=\text{tanh}\left({W}_{C}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{C}\right)$$

(4)

The cell state ${c}_{t}$ and hidden state ${h}_{t}$ are calculated as:

$${c}_{t}={f}_{t}*{c}_{t-1}+{i}_{t}*{\widetilde{C}}_{t}$$

(5)

$${h}_{t}={o}_{t}*\text{tanh}\left({C}_{t}\right)$$

(6)

Compared to LSTM, which can only process data in a single direction—meaning it relies solely on information from previous time steps for predictions—Bi-LSTM processes data in both directions simultaneously. This enables a more comprehensive understanding of the temporal variations in sEMG signals, thereby enhancing classification performance. Given that feature importance varies across different motion states, especially in cases of large movements like RM and DLB, relying solely on CNN and BiLSTM may not sufficiently emphasize these critical features. To address this issue, we introduced an attention mechanism³³. The attention mechanism selectively focuses on the most relevant parts of the sEMG signals associated with each motion pattern, assigning them higher weights, thus improving the robustness and interpretability of the classification results.

During model training, cross-entropy loss was used as the optimization criterion, with the ADAM optimizer set to a learning rate of 0.0001. The model was trained for 200 epochs with a batch size of 128. The detailed model architecture is outlined below:

First, the pre-processed sEMG signal passes through a 1D convolutional layer (with 64 filters of size 3), which extracts local temporal features from the signal. This is followed by a maxpooling layer (pooling size of 2) to reduce feature dimensionality. The output from the convolutional layer is then fed into the BiLSTM layer, where the number of units is set to 128, with a dropout rate of 0.2 to prevent overfitting. The BiLSTM captures the long-term temporal dependencies of the signal. We applied an attention mechanism to enhance the model’s attention to critical sEMG features. The output is then flattened for processing in subsequent fully connected layers. The flattened features pass through a fully connected layer (with 256 neurons and ReLU activation), followed by another dropout layer (0.5) to further prevent overfitting. Finally, a softmax function outputs the probability distribution across five categories, representing different motion states, as shown in Figs. 5.

We collected 10 sets of sEMG data from 10 participants, using 80% for training and 20% for testing and model validation. As shown in Fig. 6, as the number of training epochs increases, both the training and validation losses gradually decrease, indicating good convergence. The accuracy of the training and validation sets improves with additional epochs and stabilizes in the later stages, demonstrating the model’s effectiveness.

Evaluation methods

First, to understand our proposed model’s ability to differentiate between various motion states, we utilized t-distributed stochastic neighbor embedding (t-SNE) for feature space visualization analysis³⁴. This dimensionality reduction technique provides an intuitive visualization of the separation between the five motion states at different processing stages, effectively revealing the model’s internal representation learning capabilities. By extracting and visualizing features after each major component of our architecture, we can systematically track how signal representations evolve and become increasingly discriminative throughout the network.

Subsequently, to investigate the attention layer’s focus on different motion states, we visualized the attention weight matrix as heatmap. By analyzing the attention heatmap, we were able to evaluate the model’s allocation of attention and verify whether it aligns with the motion patterns.

To comprehensively evaluate the classification performance of the proposed model, it is imperative to draw conclusions based on relevant evaluation metrics. While accuracy is a robust metric that reflects the model’s overall performance, relying solely on accuracy is insufficient—especially in multi-class classification tasks where the ability to differentiate between classes can vary significantly. A single accuracy metric does not capture the nuances of performance for each individual category. This is particularly evident for motion modes that are easily confused, such as RS vs. MA and RM vs. DLB, where accuracy alone might mask underlying misclassification issues. Therefore, we introduced the confusion matrix as an evaluation tool to provide a more intuitive and detailed view of the model’s discriminative capabilities. By analyzing the confusion matrix, we can observe the classification accuracy for each category and pinpoint which motion modes are prone to confusion. This deeper analysis yields valuable insights into the model’s performance across various motion modes, thereby identifying potential areas for improvement and offering targeted directions for future optimization.

Furthermore, to assess the generalization capability of the proposed model across different subjects, we incorporated leave-one-subject-out cross-validation (LOOCV) into our evaluation. LOOCV is particularly well-suited for datasets with inherent subject-specific variations³⁵, as it systematically designates each subject’s data as the test set while using the remaining data for training. This approach captures the variability in performance across individuals, revealing both the strengths and weaknesses of the model when applied to unseen subjects. The LOOCV results provide additional insights into the model’s robustness and underscore potential areas for further enhancement, particularly in addressing misclassification issues among closely related motion modes.

For statistical comparison of model performances, we employed Dunn’s test with Bonferroni correction³⁶. This non-parametric approach was selected after normality testing revealed that cross-subject performance data for some models did not conform to normal distribution assumptions. Given the multiple comparison scenario involving five different models, Dunn’s test provides a robust framework for identifying significant performance differences while the Bonferroni correction controls the family-wise error rate, ensuring statistical rigor in our conclusions.

Results

Visualization of the feature extraction process.

Figure 7 illustrates the feature distribution in the 2D t-SNE space prior to deep learning model processing. The visualization reveals that after manual feature extraction, the sEMG signals demonstrate an initial level of separability between motion states. Figure 8 shows that following convolutional processing, while preliminary clustering patterns emerge, substantial overlap persists between different motion states. As shown in Fig. 9, the incorporation of temporal modeling through the BiLSTM layer significantly enhances the class structure, with distinct motion clusters becoming more apparent. Finally, Fig. 10 demonstrates that after processing through the complete network architecture, the motion states form well-defined and distinctly separated clusters. This pronounced separation in the feature space validates the effectiveness of our proposed model in learning discriminative features for robust motion state classification.

Visualization of the attention

Figures 11 and 12 indicate that the attention weights for RS and MA are relatively uniform, with standard deviations of 0.08 and 0.05, respectively. Figures 13 and 14 reveal that RM and DLB share similar characteristics, as the model assigns greater importance to information from time steps T4 to T5; this is also reflected in their higher attention weight variability, with standard deviations of 0.23 for RM and 0.12 for DLB. Figure 15 illustrates the attention weights for SLB, showing that the model places more emphasis on information at time step T1 when making classification decisions, with a corresponding standard deviation of 0.17.

Ablation analysis

The ablation study results, as summarized in Table 3 and Fig. 16, demonstrate that the proposed model outperforms all compared architectures across key evaluation metrics. Specifically, it achieves an accuracy of 97.29%, precision of 97.29%, recall of 97.29%, and an F1 score of 0.9729, surpassing the performance of individual CNN (96.00%), BiLSTM (90.30%), CNN + BiLSTM (96.64%), and CNN + Attention (94.96%) models. Furthermore, the model maintains a reasonable inference time of 57.20 ± 2.38 ms, which is comparable to simpler architectures while delivering superior performance. Figure 17 displays the confusion matrix of the proposed model, showing clear diagonal dominance, which indicates high classification accuracy across all motion states.

Table 3 Ablation study: impact of different module combinations on classification performance.

Full size table

As detailed in Table 4, the proposed model achieved excellent performance across all motion states. The RS showed the highest performance with 98.39% precision and 98.87% recall, resulting in an F1 score of 0.9863. Similarly, SLB demonstrated strong results with 97.75% precision and 97.90% recall (F1: 0.9782). While RM and DLB showed slightly lower metrics with F1 scores of 0.9608 and 0.9655 respectively, they still maintained robust classification performance. MA also achieved impressive results with 97.59% precision and 97.12% recall (F1: 0.9735), confirming the model’s consistent performance across all motion states.

Table 4 PER-class performance metrics.

Full size table

Comparation with other studies

Regarding comparisons with other models, a challenge in upper limb motion classification research is the absence of standardized datasets, unlike the established benchmarks available for gesture recognition tasks^15,35. Most researchers in this field construct their own datasets, which complicates direct performance comparisons across studies. While the EMAHA-DB1 dataset, which includes 22 common upper limb movements¹⁵, offers potential for standardization, it has not yet been widely adopted in related research. To establish a meaningful evaluation framework, we implemented several baseline algorithms including CNN + LSTM ⁶, Cubic SVM ¹⁶, LSTM²⁴, and DCNN ²², comparing their performance against our proposed model on our custom dataset. Additionally, we utilized LOOCV methodology to rigorously validate all algorithms, ensuring robust assessment of generalization capabilities across different subjects. Table 5 presents a comprehensive comparison of classification accuracy between these established approaches and our proposed method.

Table 5 Model performance comparison of LOOCV results.

Full size table

The experimental results comparing different models are presented in Fig. 18 and Table 5. The proposed model achieved the highest performance across all metrics, with an accuracy of 88.17 ± 5.39%, precision of 88.76 ± 4.97%, recall of 88.13 ± 5.47%, and F1 score of 0.8799 ± 0.5555. Statistical analysis using Dunn’s test with Bonferroni correction confirmed our model significantly outperformed the LSTM model (p < 0.001) and DCNN model (p < 0.001) in F1 score. The CNN-LSTM architecture showed the second-best performance (accuracy: 77.96 ± 10.38%), followed by the SVM model (accuracy: 77.65 ± 6.42%). Notably, our proposed model maintained more consistent performance with a considerably lower standard deviation (5.39%) compared to CNN-LSTM (10.38%), indicating better stability across subjects. The DCNN model achieved moderate results (accuracy: 56.54 ± 24.20%), while the basic LSTM model performed lowest (accuracy: 46.71 ± 11.54%). These results demonstrate that our proposed architecture provides robust sEMG signal classification with enhanced generalization capabilities for upper limb motion recognition across different subjects.

Discussion

The experimental results demonstrate the effectiveness of our proposed model for sEMG-based movement classification in human-exoskeleton systems. The training dynamics illustrated in Fig. 6 reveal stable convergence without overfitting, as evidenced by the consistent narrowing of both training and validation losses. This stability, combined with the plateauing accuracy curves, indicates the model’s robustness and appropriate complexity for the task.

The progressive feature visualization through t-SNE (Figs. 7, 8, 9, 10) provides insights into our model’s transformation of sEMG signals into discriminative features. While overlap between classes in these 2D projections should be interpreted cautiously, as t-SNE cannot perfectly preserve high-dimensional spatial relationships, the overall trend from entangled to increasingly distinct clusters validates our architecture’s feature learning capabilities. The visualization reveals a clear evolutionary pattern: initial convolutional processing establishes basic clustering tendencies (Fig. 8), temporal modeling through BiLSTM enhances motion-specific patterns (Fig. 9), and the attention mechanism further refines class separation (Fig. 10). This progression supports our architectural design hypothesis that each component contributes unique and complementary discriminative capabilities to the final classification task.

Figures 11 and 12 show that the attention weights for RS and MA are relatively uniform, with standard deviations of 0.08 and 0.05, respectively. This indicates that under these conditions, the model does not rely heavily on any specific time step but integrates features evenly across the entire time window.

Figures 13 and 14 reveal that, for RM and DLB, the model assigns higher attention during the T4–T5 interval, suggesting that significant muscle activations are detected during this period, which leads the model to focus on this information for classification.

In contrast, Fig. 15 shows that for SLB, the attention is predominantly focused on time step T1, indicating that when maximum effort is exerted, the signal characteristics at certain time steps become more pronounced, prompting the model to assign greater weight to these cues.

Overall, the attention mechanism exhibits adaptive capabilities: the model distributes its focus evenly in stable or mild conditions, whereas in high-intensity or dynamic tasks it emphasizes distinctive features at specific time steps. This behavior confirms the potential of attention-based models in capturing subtle variations in sEMG signals and provides a theoretical basis for further optimizing motion state recognition systems.

The ablation study findings offer valuable insights into the role of different neural network components in sEMG-based motion recognition, as shown in Fig. 16. While CNN effectively capture local temporal patterns in sEMG signals, achieving 96.00% accuracy, the relatively lower performance of the standalone BiLSTM (90.30%) suggests that long-term dependencies alone are insufficient for robust classification. However, the synergistic improvement observed in the CNN + BiLSTM architecture (96.64%) demonstrates that combining local feature extraction with long-range temporal modeling significantly enhances the model’s ability to distinguish between similar motion patterns.The superior performance of our proposed model (97.29%) over both CNN + BiLSTM and CNN + Attention architectures indicates that the attention mechanism’s selective focus on relevant signal segments complements both local pattern detection and sequential feature extraction. This architectural synergy is particularly important for real-world applications, where the ability to automatically identify and emphasize discriminative signal components can help overcome individual variations in muscle activation patterns.

The per-class performance analysis reveals interesting patterns in motion state recognition (Fig. 17, Table 4). The high F1 score in distinguishing RS (F1: 0.9863) and SLB (F1: 0.9782) suggests that these conditions produce distinctly different muscle activation patterns. The slightly lower performance in differentiating between RM and DLB (F1: 0.9608 and 0.9655 respectively) reflects the inherent challenge of capturing load-related information from muscle activation signals alone.

A critical aspect of our study is the cross-subject generalization evaluation through LOOCV (Fig. 18, Table 5). The observed accuracy reduction when testing on completely new subjects aligns with previous research findings^15,35. and highlights a persistent challenge in sEMG-based motion recognition. However, our model demonstrates superior resilience to this performance degradation compared to baseline approaches, maintaining the highest mean accuracy and lowest standard deviation across subjects. This suggests that our architecture better captures universal movement patterns while being less susceptible to individual-specific signal characteristics.

Despite these promising results, several limitations must be acknowledged. First, individual differences in sEMG signals remain a significant challenge for cross-user recognition^37,38. Variations in muscle physiological characteristics, electrode placement, and movement execution styles among different users result in substantial differences in sEMG patterns. Second, the current study focused on a specific set of movement patterns; the model’s performance on more complex or transitional movements requires further investigation. Third, the study concentrated solely on motion intent classification, laying the groundwork for human–exoskeleton interaction. In addition, the use of a 500 ms sliding window with a 450 ms overlap may introduce a system delay. These issues need to be addressed in future work.

Future work should address these limitations by:

(1)
Developing transfer learning frameworks based on common features, extracting universal movement pattern representations from multi-user data to establish foundational models that enable new users to adapt quickly with minimal personalized data, and designing incremental learning mechanisms that allow the system to recognize and memorize newly emerging movement patterns online, thereby improving adaptability to unknown behaviors;
(2)
Expanding the movement pattern vocabulary to encompass more diverse and complex activities, and establishing standardized benchmarks for upper-limb movements in human–exoskeleton systems to enable fair comparisons across different sEMG-based recognition methods;
(3)
Exploring the integration of motion state classification with trajectory prediction through transformer-based sequence-to-sequence models, combined with model predictive control frameworks, as a potential approach to enable real-time anticipatory exoskeleton actuation.

Conclusion

This paper presents a novel deep learning architecture for sEMG-based upper limb motion recognition in human-exoskeleton systems. By integrating CNN for local pattern extraction, BiLSTM for temporal dependency modeling, and an attention mechanism for selective feature emphasis, our model achieves robust classification performance across different motion states. The experimental results demonstrate the effectiveness of this approach, achieving 97.29% accuracy in ablation studies and maintaining 88.17 ± 5.39% accuracy in cross-subject validation, surpassing traditional approaches and baseline deep learning models.

The comprehensive evaluation through feature visualization, ablation studies, and cross-subject testing validates our architectural design choices. The t-SNE visualizations reveal the progressive improvement in feature discrimination through different network components, while the ablation study quantitatively confirms each component’s contribution to the final performance. Particularly noteworthy is the model’s ability to distinguish between similar motion patterns, such as rapid movements and dynamic load-bearing activities, with F1 scores exceeding 0.96 for all motion states.

While the results are promising, challenges remain in achieving consistent performance across different subjects due to individual variations in sEMG patterns. Future work should focus on developing transfer learning frameworks for better cross-user adaptation, designing incremental learning mechanisms for online pattern recognition, and expanding the movement pattern vocabulary. These improvements will be crucial for advancing the practical application of sEMG-based motion recognition in human-exoskeleton systems.

The proposed approach represents a significant step toward more reliable and adaptable human-exoskeleton interaction systems, offering potential benefits for rehabilitation, assistive technology, and industrial applications. Our findings contribute to the broader understanding of deep learning applications in biosignal processing and human motion recognition, while also highlighting important directions for future research in this field.

Data availability

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Dhatrak, P., Durge, J., Dwivedi, R. K., Pradhan, H. K. & Kolke, S. Interactive design and challenges on exoskeleton performance for upper-limb rehabilitation: A comprehensive review. Int. J. Interact. Des. Manuf. (IJIDeM) https://doi.org/10.1007/s12008-024-02090-9 (2024).
Article Google Scholar
Ochieze, C., Zare, S. & Sun, Y. Wearable upper limb robotics for pervasive health: A review. Progress Biomed. Eng. 5(3), 32003 (2023).
Article Google Scholar
Zhao, Y., Mao, J., Zhang, M., Wu, H., Jiang, J., & Jing, S. Integration of neuromuscular control for multidirectional horizontal planar reaching movements in a portable upper limb exoskeleton for enhanced stroke rehabilitation. Biomed. Eng. Biomed. Tech. (2025).
Zhao, Y., Wu, H., Zhang, M., Mao, J. & Todoh, M. Design methodology of portable upper limb exoskeletons for people with strokes. Front. Neurosci. 17, 1128332 (2023).
Article PubMed PubMed Central Google Scholar
Ebrahimi, A. Stuttgart Exo-jacket: An exoskeleton for industrial upper body applications. In Editor (Ed.) Stuttgart Exo-Jacket: An Exoskeleton for Industrial Upper Body Applications (IEEE, 2017,edn.), 258–263.
Lee, J. et al. Intelligent upper-limb exoskeleton integrated with soft bioelectronics and deep learning for intention-driven augmentation. npj Flexible Electron. 8(1), 11. https://doi.org/10.1038/s41528-024-00297-0 (2024).
Article Google Scholar
Trigili, E. et al. Detection of movement onset using EMG signals for upper-limb exoskeletons in reaching tasks. J. Neuroeng. Rehabil. 16, 1–16 (2019).
Article Google Scholar
Salman, N., & Benali, A. Adaptive weight compensation in assistive upper-limb exoskeletons: An EMG analysis. In Editor (Ed.) Adaptive weight compensation in Assistive Upper-Limb Exoskeletons: an EMG Analysis (IEEE, 2024,edn.), 387–392.
Zhao, D., Wang, S., Sun, S., Zhang, X., & Vänni, K. Experimental evaluation of a passive upper limb exoskeleton for high voltage live-line operations. In Editor (Ed.) Experimental Evaluation of a Passive Upper Limb Exoskeleton for High Voltage Live-Line Operations (IEEE, 2024,edn.), 183–187.
Chen, X., Liu, M., & Zhang, S. An LSTM-attention-based method to muscle fatigue detection by integrating multi-source sEMG signals. In Editor (Ed.) An LSTM-Attention-based Method to Muscle Fatigue Detection by Integrating Multi-Source sEMG Signals (IEEE, 2021,edn.), 8475–8480.
Parker, P. A. & Scott, R. N. Myoelectric control of prostheses. Crit. Rev. Biomed. Eng. 13(4), 283–310 (1986).
CAS PubMed Google Scholar
de Jonge, S., Potters, W. V. & Verhamme, C. Artificial intelligence for automatic classification of needle EMG signals: A scoping review. Clin. Neurophysiol. 159, 41–55 (2024).
Article PubMed Google Scholar
Huang, Y., Englehart, K. B., Hudgins, B. & Chan, A. D. A Gaussian mixture model based classification scheme for myoelectric control of powered upper limb prostheses. IEEE Trans. Bio Med. Eng. 52(11), 1801–1811 (2005).
Article Google Scholar
Li, Z., Wang, B., Yang, C., Xie, Q. & Su, C. Boosting-based EMG patterns classification scheme for robustness enhancement. IEEE J. Biomed. Health 17(3), 545–552 (2013).
Article Google Scholar
Totah, D. et al. Low-back electromyography (EMG) data-driven load classification for dynamic lifting tasks. PLoS ONE 13(2), e192938 (2018).
Article Google Scholar
Aziz, S., Khan, M.U., Aamir, F., & Javid, M.A. Electromyography (EMG) data-driven load classification using empirical mode decomposition and feature analysis. In Editor (Ed.) Electromyography (EMG) Data-Driven Load Classification Using Empirical Mode Decomposition and Feature Analysis (IEEE, 2019,edn.), 272–2725
Tang, Z. et al. An upper-limb power-assist exoskeleton using proportional myoelectric control. Sensors 14(4), 6677–6694 (2014).
Article ADS PubMed PubMed Central Google Scholar
Sedighi, P., Li, X. & Tavakoli, M. Emg-based intention detection using deep learning for shared control in upper-limb assistive exoskeletons. IEEE Robot. Autom. Lett. https://doi.org/10.1109/LRA.2023.3330678 (2023).
Article Google Scholar
Shen, S., Gu, K., Chen, X., Yang, M. & Wang, R. Movements classification of multi-channel sEMG based on CNN and stacking ensemble learning. IEEE Access 7, 137489–137500 (2019).
Article Google Scholar
Erözen, A. T. A new CNN approach for hand gesture classification using sEMG data. J. Innovat. Sci. Eng. (JISE) 4(1), 44–55 (2020).
Article Google Scholar
Tuncer, S. A. & Alkan, A. Classification of EMG signals taken from arm with hybrid CNN-SVM architecture. Concurrency Comput. Practice Exp. 34(5), e6746 (2022).
Article Google Scholar
Chaobankoh, N. et al. Classification of transhumeral movements using deformable CNN with magnitude-based STFT from tripartite EMG sensor data. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3429268 (2024).
Article Google Scholar
Foroutannia, A., Akbarzadeh-T, M. & Akbarzadeh, A. A deep learning strategy for EMG-based joint position prediction in hip exoskeleton assistive robots. Biomed. Signal Proces. 75, 103557 (2022).
Article Google Scholar
Pesenti, M. et al. IMU-based human activity recognition and payload classification for low-back exoskeletons. Sci. Rep. 13(1), 1184 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., & Xu, B. Attention-based bidirectional long short-term memory networks for relation classification. In Editor (Ed.) Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification (2016,edn.), pp 207–212.
Hu, Y. et al. A novel attention-based hybrid CNN-RNN architecture for sEMG-based gesture recognition. PLoS ONE 13(10), e206049 (2018).
Article Google Scholar
Wang, Y., Wu, Q., Dey, N., Fong, S. & Ashour, A. S. Deep back propagation–long short-term memory network based upper-limb sEMG signal classification for automated rehabilitation. Biocybern. Biomed. Eng. 40(3), 987–1001 (2020).
Article Google Scholar
Boettcher, C. E., Ginn, K. A. & Cathers, I. Standard maximum isometric voluntary contraction tests for normalizing shoulder muscle EMG. J. Orthop. Res. 26(12), 1591–1597 (2008).
Article PubMed Google Scholar
Hermens, H. J. et al. European recommendations for surface electromyography. Roessingh Res. Dev 8(2), 13–54 (1999).
Google Scholar
Chen, Z., Qiao, X., Liang, S., Yan, T. & Chen, Z. sEMG-based gesture recognition via multi-feature fusion network. IEEE J. Biomed. Health https://doi.org/10.1109/JBHI.2024.3522306 (2024).
Article Google Scholar
Williams, R. J. & Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989).
Article Google Scholar
Pascanu, R. On the difficulty of training recurrent neural networks. arXiv:1211.5063 (2013).
Vaswani, A. Attention is all you need. In Advances in Neural Information Processing Systems (2017).
Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008).
Google Scholar
Rohr, M. et al. On the benefit of FMG and EMG sensor fusion for gesture recognition using cross-subject validation. IEEE Trans. Neural Syst. Rehabil. Eng. 33, 935–944 (2025).
Article PubMed Google Scholar
Dunn, O. J. Multiple comparisons using rank sums. Technometrics 6(3), 241–252 (1964).
Article Google Scholar
Karnam, N. K., Turlapaty, A. C., Dubey, S. R. & Gokaraju, B. EMAHA-DB1: A new upper limb sEMG dataset for classification of activities of daily living. IEEE T Instrum. Meas. 72, 1–11 (2023).
Article Google Scholar
Vidovic, M.M., Paredes, L.P., Hwang, H., Amsu, S., Pahl, J., Hahne, J.M., Graimann, B., Farina, D., & Müller, K. Covariate shift adaptation in EMG pattern recognition for prosthetic device control. In Editor (Ed.) Covariate shift adaptation in EMG pattern recognition for prosthetic device control (IEEE, 2014, edn.), pp 4370–4373.

Download references

Acknowledgements

This work was supported by Zhejiang Key Laboratory of Intelligent Systems and Equipment for Digital Creativity.

Funding

This work was supported by Zhejiang Key Laboratory of Intelligent Systems and Equipment for Digital Creativity.

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Dongwei Zhao, Chenfeng Zhang & Shouqian Sun
Center for Rehabilitation Medicine, Department of Rehabilitation Medicine, Zhejiang Provincial People’s Hospital (Affiliated People’s Hospital, Hangzhou Medical College), Zhejiang Engineering Research Center for Digital-Intelligent Rehabilitation Equipment, Hangzhou, China
Xiangming Ye & Ruidong Cheng
Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, China
Song Wang
Kuntai Exobot&Wibot Lab, Taizhou Mintai Robotics Co., LTD, Taizhou, China
Xuequn Zhang

Authors

Dongwei Zhao
View author publications
Search author on:PubMed Google Scholar
Xiangming Ye
View author publications
Search author on:PubMed Google Scholar
Song Wang
View author publications
Search author on:PubMed Google Scholar
Chenfeng Zhang
View author publications
Search author on:PubMed Google Scholar
Shouqian Sun
View author publications
Search author on:PubMed Google Scholar
Xuequn Zhang
View author publications
Search author on:PubMed Google Scholar
Ruidong Cheng
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.D. was responsible for designing the experiments and writing the manuscript. X.Y. and R.C. were responsible for the collection and pre-processing of electromyography signals. S.W. and X.Z. were responsible for the exoskeleton design. C.Z. was responsible for the implementation of the program. S.S. was responsible for reviewing the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Shouqian Sun.

Ethics declarations

Competing interest

The authors declare no competing interests.

Ethics approval and consent to participate

All participants provided written informed consent before participation. The study was approved by the Ethics Committee of Zhejiang Provincial People’s Hospital (KY2024220).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, D., Ye, X., Wang, S. et al. Upper limb human-exoskeleton system motion state classification based on semg: application of CNN-BiLSTM-attention model. Sci Rep 15, 18969 (2025). https://doi.org/10.1038/s41598-025-02864-5

Download citation

Received: 10 February 2025
Accepted: 16 May 2025
Published: 30 May 2025
DOI: https://doi.org/10.1038/s41598-025-02864-5

Subjects

Abstract

Similar content being viewed by others

Intelligent upper-limb exoskeleton integrated with soft bioelectronics and deep learning for intention-driven augmentation

Empowering stroke recovery with upper limb rehabilitation monitoring using TinyML based heterogeneous classifiers

Surface electromyography evaluation for decoding hand motor intent in children with congenital upper limb deficiency

Introduction

Methods

Participants

Apparatus

Testing procedures

Data analysis

Data processing and acquisition

Classification model

Evaluation methods

Results

Visualization of the feature extraction process.

Visualization of the attention

Ablation analysis

Comparation with other studies

Discussion

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interest

Ethics approval and consent to participate

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links