Introduction

Mental health is a fundamental aspect of overall well-being and a key determinant of an individual’s quality of life1. According to the World Health Organization (WHO), mental health is a state of well-being in which individuals can realize their physical and mental potential, cope with everyday stressors, and contribute meaningfully to their communities2. Mental health factors include emotional stability, financial security, social relationships, and physical health3. Disruptions in any of these areas can significantly impair mental well-being. Among the most serious threats to mental health are mental disorders, such as stress, sadness, depression, anxiety, and fear, which can arise at any stage of life and pose long-term risks to an individual’s psychological and physical health4. Accurately identifying mental states is critical for improving mental health outcomes5. Mental states, often triggered by life circumstances, can persist for extended periods and recur multiple times throughout a person’s life. Delayed or incorrect identification of these states can diminish quality of life, reduce personal effectiveness, and hinder recovery. Early and accurate assessment of anxiety and depression is particularly crucial, as it facilitates timely intervention and treatment6. However, current diagnostic methods often rely on non-standardized questionnaires, which can lead to misinterpretations of anxiety and depression levels and result in fragmented or ineffective treatment7.

Anxiety and depression are among the most prevalent mental disorders, with causes typically categorized into personal and social factors8. Individual factors include age, gender, chronic illness, genetic predisposition, hormonal imbalances, marital status, and history of suicidal behavior9. Social factors encompass education level, employment status, income, family dynamics, stressful life events, and social relationships10. Challenges in adapting to or managing these factors can contribute to the development of anxiety and depression, ranging from mild to severe levels. Both anxiety and depression manifest through physical and behavioral symptoms11. Depression is often characterized by prolonged sadness, loss of energy, lack of motivation, reduced self-confidence, changes in appetite and weight, and suicidal thoughts12. Anxiety, on the other hand, may present as persistent nervousness, excessive worry, restlessness, irrational fears, and irritability13. However, the variability of these symptoms across individuals complicates their diagnosis using traditional methods, highlighting the need for more objective and reliable assessment tools14.

How a person walks is a complex behavior that integrates skeletal, neuromuscular, and cognitive systems. While walking involves voluntary actions such as stepping and running, it relies on involuntary mechanisms like energy regulation, balance maintenance, and coordination of upper body movements15. Each exhibits a unique walking pattern, shaped by physiological characteristics and learned behaviors from childhood. Strong emotions like fear, anger, and nervousness can temporarily alter gait patterns16. Researchers have explored the relationship between mental states, such as anxiety and depression, and changes in gait parameters17,18,19. Studies have identified variations in step length, walking speed, arm swing, and vertical head movements in individuals experiencing anxiety and depression, suggesting a link between gait characteristics and mental health20. Previous research has demonstrated the potential of gait analysis in identifying mental disorders. For instance, Valentin Riemer et al. used the Kinect sensor to track movement parameters during physical activities and found correlations between head, trunk, and joint movements and anxiety/depression21. Similarly, Ivan Birch et al. highlighted the connection between gait, the neuromuscular system, and cognitive processes, suggesting that emotions can influence movement patterns22. Frederica Pieruccini-Faria et al. investigated the impact of depression on balance and stability in elderly individuals, revealing significant effects on motor control23. Johannes Michalak et al. explored how happy and sad states influence walking patterns24. Bingil Sun et al. linked self-confidence to specific gait parameters, such as stride length and walking speed15. Laura Avanzino et al. examined the interaction between emotions and gait in Parkinson’s patients, emphasizing the potential of gait data for emotion detection and rehabilitation25. Gentiana Venture et al. utilized the Kinect sensor to identify happiness and sadness through motion parameters like step time and walking speed, further supporting the connection between gait and mental states20. According to Mennella et al. (2024), anxiety induced by threat anticipation can invigorate locomotion in healthy individuals, leading to increased gait speed and enhanced limb movements26. Kim and O’Sullivan (2024) found that although a four-week smartphone-based physical activity intervention did not significantly outperform a control group in reducing depression scores, it did influence movement characteristics, including increased body sway and vertical head movement during exercise tasks27.

Despite these advancements, most studies rely on questionnaires as a primary tool for assessing mental states, which can be subjective and prone to bias. The primary goal of this research is to develop a novel approach for diagnosing and classifying anxiety and depression without the need for traditional questionnaires. This study aims to establish a reliable, non-invasive method for identifying mental health disorders based on movement patterns by focusing on recording, extracting, and analyzing gait data. The Microsoft Kinect sensor recorded gait data from 50 participants (26 males and 24 females), and key kinematic parameters such as step length, step width, and cadence were extracted. Participants completed the GAD-7 and BDI-II scales to assess baseline anxiety and depression levels, while data augmentation addressed dataset imbalances. Machine learning techniques were applied to classify and estimate anxiety and depression levels.

Materials and methods

Participant

In this study, the GAD-7 and Beck Depression (Beck-2) questionnaires were used to measure the level of anxiety and depression of the participants28. The Beck Depression Inventory is a self-report tool for assessing depression levels without defining a cut-off point. It targets individuals aged between 13 and 80 years and includes 21 items, each offering four response options from which participants must choose only one that best describes their current state. The anxiety questionnaire (GAD-7) consists of 7 items, each with 3 response options. A score greater than 10 indicates anxiety, with the maximum possible score being 21. Fifty healthy individuals (26 men and 24 women) participated in the walking data collection. Their age ranged from 19 to 27 years, with an average age of 23. Most participants were university students, while a few were graduates. Participants were required to be young and healthy, specifically aged between 18 and 30 years. They must not have been currently taking medication for depression, anxiety, sedatives, sleeping aids, or medications for epilepsy. Participants were also required to disclose any physical injuries, such as fractures of the upper or lower limbs. Individuals who had experienced a severe mental illness within one month prior to data collection were excluded. Furthermore, female participants were not allowed to participate if they were menstruating at the time of data collection.

To measure the depression and anxiety scale, participants completed questionnaires. Depression was measured by the Beck questionnaire and a scale ranging from 0 to 63. The depression scale estimator, considering the walking data and the actual scale calculated by the questionnaire, obtained an estimated scale for depression. For anxiety, the General Anxiety Questionnaire was used, which ranges from 0 to 21. The anxiety estimator determined the estimated anxiety scale using the observations from the walking database and the exact scale. People with depression are classified into different categories. In this study, a scale of 0 to 13 is defined as the lowest level of depression, a scale of 14 to 19 as mild depression, a scale of 20 to 28 as moderate depression, and finally a scale of 29 to 63 as severe depression29. This type of classification has been used in various clinical applications, such as investigating depression in cardiovascular diseases, motor disabilities, Parkinson’s, kidney failure, and other cases30. The Generalized Anxiety Inventory determined the anxiety scale of the subjects. This questionnaire has a cut-off criterion in which a number higher than 10 on the scale is defined as an anxious person. The categories are divided into “anxious” and “non-anxious” categories. Therefore, to classify anxiety states, the principles of the questionnaire were followed and anxiety states were divided into two categories.

The Human Research Ethics Committee at Hakim Sabzevari University approved the study under protocol number IR.HSU.REX.1404.006. All participants provided written informed consent before participation. They were informed about the purpose of the study, their right to withdraw at any time, and the measures taken to ensure the confidentiality of their information. Data were anonymized at the collection point, and no personally identifiable information was stored. All data were securely stored in encrypted formats and accessed only by authorized research team members.

Protocol recording

This study used Microsoft’s Kinect version 2 sensor (Kinect Microsoft XBOX One, V2) to record walking data. The device’s hardware includes a color lens, an infrared emitter, a depth sensor, and a microphone. A tripod was used to place the Kinect sensor in the recording environment and prevent the effects of mechanical noise, such as vibration or sudden shocks. MATLAB software and the Kinect version 2 image acquisition toolbox were used to record walking data. Depth images were recorded with a maximum resolution of 512 × 424 and a rate of 30 frames per second. Data recording was performed using a Toshiba laptop (Toshiba-Satellite-L50) with hardware features such as an Intel processor (Core i5-4200u), 8 GB of RAM (DDR3), integrated graphics (Intel HD Graphics Family), and 2 GB of discrete graphics (AMD Radon R260). The gait data recording was performed in an area of ​​30 square meters. The Kinect sensor was fixed at a height of one meter from the ground with a tripod angle of 180 degrees. There were no electronic devices such as mobile phones and smartwatches within a distance of 1 m from the sensor and within a radius of 5 m. The lighting of the recording environment was constant, and direct light was prevented from entering the recording environment. The recording process for all subjects was carried out in the cold months of the year, when the heating system was always active to adjust the room temperature.

The walking path set for the subjects was straight, without deviations or obstacles, and had a length of 4.5 m. The direction of movement of the subjects was considered along the front-back axis, with the beginning of the path being the starting point of the movement and the end being the Kinect sensor. Each subject completed the specified path 6 times from the starting point of the step to the end of the path in a one-way manner. This means that we have recorded a total of \(\:6 \times 50=300\) motion data. All subjects started walking with the announcement of the start of the movement, that is, the sentence “move” from the researcher, and reached the end of the path. With the announcement again, the subject returned to the starting point of the walking and performed the subsequent trial according to the previous conditions. Figure 1 shows the environment for recording walking data. The Kinect sensor can record depth and color images, and items such as Color, Metadata Depth, Ts Color, and Ts Depth are recorded and stored. In this study, from the recorded time and depth images, position signals in three directions, X, Y, and Z, were generated for 25 joints, including upper and lower body joints. Therefore, the main foundation of anxiety and depression recognition is formed from raw joint position signal data.

Fig. 1
figure 1

A realistic diagram of the data recording environment. The starting point, the length of the walking path, the height of the kinect above the ground, and the angle of the kinect relative to the horizontal axis are specified.

Preprocessing data

This section addresses the preprocessing of gait data, which is essential for extracting movement parameters to estimate and classify anxiety and depression. Time series data, such as gait recordings, require thorough preprocessing and corrections to ensure reliability. The Kinect version 2 sensor used in this study can track human movement within a range of up to 5 m. However, the effective tracking distance is typically between 0.5 and 4.5 m, depending on clothing and environmental conditions31. Figure 2 illustrates a real-time display from the Kinect sensor, showing examples of subject detection. However, as seen in parts a, k, and l of the Fig. 2, distortions and outliers are present, particularly in the lower body joints, due to several limitations. These distortions occur primarily during the initial 0.5 m of walking when subjects may accelerate, causing overlapping of opposite leg joint data and reducing tracking accuracy32. Additionally, similar distortions appear approximately 1 m from the sensor, often near the end of the walking path.

Fig. 2
figure 2

Real-time display of the Kinect sensor during walking samples for subject recognition. Parts (a),(k), and (l) show instances of joint distortion: part (a) was removed due to the loss of position in the ankle, knee, and thigh joints of the right leg, while parts (k) and (l) were excluded due to loss of position in the upper and lower trunk joints during preprocessing. The remaining parts—(b), (c), (d), (f), (g), (h), (i), and (j)—represent the frames selected for feature extraction.

These issues arise from a combination of factors: The Kinect sensor’s performance is influenced by environmental elements like lighting, background clutter, and subject distance. In some recording sessions, non-optimal conditions affected the accuracy of skeleton detection. Loose clothing, such as long coats or baggy pants, especially at distances beyond 2 m, can interfere with accurately detecting joints. Serious tracking problems are observed at 0.5 m and beyond 4.5 m from the sensor. Irregular, rapid, or occluded movements—such as limb crossing or sudden changes in direction—further challenge the Kinect’s tracking capabilities. The resolution and frame rate of the Kinect version 2 sensor can limit the precision of skeletal mapping, particularly for detailed movements. We removed the first 0.5 m and the final 1 m of the walking sequences to improve data quality, resulting in a usable 3-meter segment free from significant distortions and tracking inconsistencies. This preprocessing step mitigates the influence of heterogeneous and non-standard jumps in the motion signals, ensuring more reliable movement analysis.

Some sudden jumps at the beginning and end of the walking path cause sudden changes in the output signal (joint position). Therefore, the position signal is smoothed to achieve reliable and error-free movement parameters. The moving average low-pass filter smooths the base signals. This filter has been used in many applications due to its simple concept and more straightforward implementation. This filter uses a finite impulse response (FIR) to stabilize an array of input signals or data. M samples are taken from the input signal with their time, and their average value appears as an output. Before implementing the smoothing filter, the beginning and end of the frames of the essential signals, which included the initial acceleration and stopping phases, were removed. By removing the frames, the original signal and the closest state to the actual walking of the subjects were created. Finally, the smoothing filter was applied with specific coefficients at the same length as the original signal, and the smoothing operation was performed. Figure 3 shows the filtered signal and the original signal of the position of the right ankle joint along the X-axis of walking.

Fig. 3
figure 3

Application of a smoothing filter to the right ankle position signal (X-axis) of Subject 1 during Experiment 2. The horizontal axis represents the frame number, while the vertical axis indicates the spatial displacement (in centimeters) of the subject’s right ankle joint.

Data augmentation

After completing the process of recording walking data and completing the questionnaire by the subjects, the disorder and uneven distribution of individuals in the depression and anxiety categories were revealed. Given the uneven number of subjects in different states of depression, there is a possibility of creating unequal categories. This leads to inappropriate classification and estimation of depression and anxiety. Therefore, the use of the time series data augmentation method has been proposed in the present study. Time series data, due to the variation of parameters over time, has been recommended for data augmentation by time and frequency methods33.

In this study, due to the subject’s lack of complexity, the original signal’s one-dimensionality (position), and the unique property in the time domain, basic methods, specifically the time domain, have been used to equalize the categories and increase the time series. Additive white Gaussian noise is a standard method for improving the data in time series and is one of the basic approaches34. Essential characteristics of Gaussian noise are its zero mean value and signal-to-noise ratio (SNR). After determining the Gaussian noise parameters, adding it to the original signal is considered a method for increasing the data and creating a new dummy signal. Additive Gaussian noise with a low signal-to-noise ratio and asymmetric distribution causes abnormal parameters when extracting walking features. The Gaussian noise used in this study is generated as a follower of a process, following the original signal. Therefore, the simulated signal is structurally similar to the original signal (position). This prevents sudden and severe changes in the motion parameters. In Fig. 4, the simulated signal is determined by the white Gaussian noise method and the original signal. In this Fig. 4, the signal-to-noise ratio is 50 \(\:dB\). This number is relatively large, and the noise signal will be lost in the original signal. It should be noted that Gaussian noise causes changes in the shape and form of the signal and is used as a new simulated signal whose parameters have undergone minor changes. Figure 4 shows the Gaussian noise added to the head position signal along the Y-axis.

Fig. 4
figure 4

Original signal and head position signal with Gaussian noise added with a signal-to-noise ratio of 50 \(\:dB\).

Table 1 shows the parameters of data augmentation and the number of subjects. In the depression state, the total number of subjects was 50, and the total number of experiments was 300, including the low, mild, moderate, and severe depression. After applying Gaussian noise, creating dummy signals, and meeting data augmentation conditions, the total number of experiments for each depression group reached 120, for 720 experiments. In the anxiety state, five subjects were assigned to the anxious group and 45 subjects to the non-anxious group, with each individual in each group repeating the experiment 6 times. After applying the additive Gaussian noise, the anxious group was fixed to 45 subjects, and the other group remained at 45. As a result, the total number of experiments for each group reached 270, for 540 experiments.

Table 1 Number of data in each stage of the data augmentation method in classifying depression and anxiety states.

Features extraction

Feature extraction is a fundamental step in gait analysis and can be implemented using various methods. This study extracts key motor parameters—such as walking speed, acceleration, time-based features, and joint displacement—to serve as inputs for machine learning models to recognize anxiety and depression. Within the gait cycle, a footprint is defined when the reference leg completes its swing phase and makes initial ground contact. This event marks the transition to the stance phase and persists until the swing of the opposite leg concludes. Accurate identification of this footprint is critical, as it facilitates calculating spatial and temporal features, including step length, stride length, and walking speed. To simplify footprint detection, the moment of minimal distance between both legs, when the swinging leg is closest to the stationary leg—is used as a proxy, as it occurs consistently once per gait cycle. Figure 5 illustrates the location of this foot-contiguous event along the subject’s walking path. Eleven features were extracted for each subject in each experiment: step length, long step length, walking speed, maximum and minimum walking speed, step width, step time, long step time, tempo, maximum arm swing, and vertical head movement, the calculation syntax of which is described in the supplementary file. These features were chosen for their relevance to motor behavior and potential association with psychological conditions. While feature selection is commonly used to reduce dimensionality and improve model performance, it was not applied in this study. The extracted features exhibited consistent ranges across experiments with no evidence of redundancy or irrelevant data. For instance, stride length remained within a typical range (0.15–0.45 m) for all subjects, and no anomalous values were observed. Therefore, all features were retained to preserve the integrity of the motion data and support robust classification of anxiety and depression.

Fig. 5
figure 5

Footprint recognition for subject number 33 of experiment 3. The blue signal indicates changes in the position of the right ankle, and the orange signal suggests changes in the position of the left ankle along the \(\:Z\) axis. The footprint is marked with a “+” sign.

Classification methods

In this study, various machine learning algorithms were employed to classify mental health states, specifically focusing on the detection of depression and anxiety. The classification techniques investigated include LDA, Naive Bayes, Multi-class SVM with a polynomial kernel, and a DNN. These methods were selected due to their diverse theoretical foundations and proven effectiveness in multi-class classification tasks. In practical applications, the choice of algorithm, kernel function (for kernel-based methods), and hyperparameter tuning significantly influence the classification performance. Hyperparameter selection, particularly for kernel-based classifiers like SVM, presents a notable challenge, directly affecting the model’s generalization ability. This work used a heuristic optimization strategy to identify optimal values for key parameters such as kernel scale and box constraint, aiming to achieve the best model performance. The optimized hyperparameters for LDA, Naive Bayes, and the SVM classifiers are summarized in Table 2. In contrast, the architecture details and hyperparameters of the DNN model, including layer configurations and training parameters, are presented in Table 3. The objective of all models is to learn effective decision rules from the feature space that can accurately predict the target class labels associated with anxiety and depression levels. A K-fold cross-validation strategy was employed to ensure reliable model evaluation and mitigate the risks of overfitting. Cross-validation enhances model robustness and provides a more generalizable estimate of its real-world performance. This procedure partitioned the dataset into K = 10 mutually exclusive folds. K − 1 folds were used for training in each iteration, while the remaining fold served as the validation set. This process was repeated 10 times, with each fold used exactly once for validation. The average classification accuracy across all folds was reported as the model’s final performance metric. This approach improves model reliability and makes more efficient use of the available data, especially given the relatively small sample size.

Table 2 Degradation parameters estimated by machine learning methods: LDA, Naive bayes, Multi-class SVM.
Table 3 The DNN estimates degradation parameters, and the layers used in the network.

Experimental setup and performance evaluation

To assess classification performance, several key statistical metrics were computed for each class based on the confusion matrix, including:

True positive rate (TPR) or recall:

$$\:TPR=\frac{TP}{TP+FN}\:$$
(1)

False negative rate (FNR):

$$\:FNR=\frac{FN}{TP+FN}\:$$
(2)

.

Positive predictive value (PPV) or precision:

$$\:PPV=\frac{TP}{TP+FP}$$
(3)

False discovery rate (FDR):

$$\:FDR=\frac{FP}{TP+FP}$$
(4)

F1 score:

$$\:F1=2\times\:\frac{PPV\times\:TPR}{PPV+TPR}$$
(5)

Additionally, the overall classification accuracy was calculated as the ratio of correctly predicted instances to the total number of samples in the dataset. Performance comparisons among the classifiers were conducted based on these evaluation metrics, highlighting each method’s relative strengths and weaknesses in multi-class classification under noisy conditions. ROC curves and Area Under the Curve (AUC) values were also plotted for further visual analysis of the classifiers’ discriminative capabilities. The computer hardware is a 20-core Core™ i7-12700 processor (2.3 GHz), 32 GB of RAM, and a GeForce RTX 3050 graphics card.

Results

This section examines the results and findings of the research on recognizing anxiety and depression using walking data. First, the results of estimating anxiety and depression are discussed, and then their classification results are evaluated qualitatively and quantitatively.

Depression recognition

Figure 6 illustrates the performance comparison of three machine learning classifiers—LDA, Naive Bayes, and a DNN—based on their normalized confusion matrices and Receiver Operating Characteristic (ROC) curves across four classes: Mild, Minimal, Moderate, and Severe anxiety or depression. In LDA, the normalized confusion matrix (Fig. 6a) shows strong classification performance in the Mild (77.8%) and Severe (88.3%) classes, though misclassifications are more frequent in the Minimal and Moderate categories. The ROC curves in Fig. 6b demonstrate high discriminative capability, particularly for Mild (AUC = 0.90) and Severe (AUC = 0.92) classes. The model performs less effectively distinguishing Minimal (AUC = 0.76) and Moderate (AUC = 0.80) classes. In Naive Bayes Classifier, as shown in Fig. 6c and again in Fig. 6e (duplicate), the Naive Bayes model exhibits reasonable accuracy in the Moderate (71.1%) and Minimal (62.8%) classes. However, the confusion matrix indicates overlap between Mild and Minimal predictions. In SVM, the ROC analysis (Fig. 6d/f) confirms robust performance with AUC scores ranging from 0.76 to 0.84 across all classes. The model distinguishes the Minimal (AUC = 0.83) and Severe (AUC = 0.84) classes. According to Fig. 6g in DNN, the DNN achieves the highest classification accuracy across most classes: Mild (80.0%), Moderate (77.8%), and Severe (87.8%). Misclassifications are relatively lower than the other models, indicating the model’s strength in learning non-linear relationships in the data. The ROC curves (Fig. 6h) show excellent discriminative power for all classes, with AUC values of 0.92 (Mild), 0.86 (Minimal), 0.88 (Moderate), and 0.92 (Severe), confirming the model’s superiority in both sensitivity and specificity.

Overall, the DNN outperforms LDA and Naive Bayes regarding classification accuracy and AUC values across all classes. While LDA performs strongly in distinguishing Severe and Mild cases, and Naive Bayes maintains a balanced performance, the DNN demonstrates the highest robustness and generalization capability for multi-class prediction of mental health states. The comparison of classification accuracy and AUC scores across models and classes is presented in Table 4.

Fig. 6
figure 6

Normalized confusion matrices and ROC curves for three classification models: (a,b) LDA, (c,d) Naive Bayes, (e,f) multi-Class SVM, and (g,h) DNN, across four severity levels of depression and anxiety (Mild, Minimal, Moderate, Severe). Confusion matrices illustrate the distribution of true vs. predicted classes, while the ROC curves display the true positive rate against the false positive rate for each class. The Area Under the Curve (AUC) values indicate the discriminative ability of each model, with the DNN achieving the highest AUC scores overall.

Table 4 Comparison of classification accuracy and AUC scores across models and Classes.

Table 5 shows that the Multi-Class SVM consistently outperformed other models, achieving the highest values across most evaluation metrics, including an overall accuracy of 86.53%. The DNN also demonstrated competitive performance, particularly in identifying moderate and severe cases, with F1 scores of 76.29% and 86.10%, respectively. In contrast, the LDA and Naive Bayes classifiers exhibited comparatively lower performance, particularly for the minimal class, which showed significant misclassification and lower F1 scores. These results highlight the importance of selecting suitable classifiers for nuanced, multi-class mental health assessment tasks, especially in cases with limited data. The evaluation metrics provide deeper insight beyond accuracy, particularly in detecting misclassifications in clinically significant classes such as moderate and severe states.

Table 5 Performance evaluation metrics for each classification method across four severity classes of depression and anxiety.

Anxiety recognition

Figure 7 presents the evaluation results of four classification models—Discriminant, Naive Bayes, Multi-Class SVM, and DNN—for detecting anxiety (binary classification: Anxious vs. No Anxiety). Each model’s performance is assessed through normalized confusion matrices and ROC curves. The LDA classifier (Fig. 7a,b) achieved recall rates of 62.6% for the Anxious class and 60.7% for the No Anxiety class, with an AUC of 0.65, indicating moderate discriminatory power. Naive Bayes (Fig. 7c,d) showed slightly lower performance with recall values of 55.9% and 61.9%, and an AUC of 0.61 for both classes. Multi-Class SVM (Fig. 7e,f) exhibited balanced performance, with recall rates of 61.9% for Anxious and 56.7% for No Anxiety, and an AUC of 0.63. The DNN (Fig. 7g,h) performed comparably with recall values of 58.9% and 57.4% and identical AUC scores of 0.63 for both classes. While none of the models achieved high discriminatory performance, the LDA and SVM models provided slightly better balance between sensitivity and specificity in detecting anxiety symptoms. The comparison of classification accuracy and AUC scores across models and classes is presented in Table 6.

Fig. 7
figure 7

Normalized confusion matrices and ROC curves for anxiety classification using four machine learning models: (a,b) LDA, (c,d) Naive Bayes, (e,f) Multi-Class SVM, and (g,h) DNN. The confusion matrices show the percentage of correct and incorrect classifications for anxious and non-anxious cases. ROC curves demonstrate the models’ diagnostic ability, where the area under the curve (AUC) is used to evaluate performance.

Table 6 Performance metrics for binary classification of anxiety.

Table 7 presents a detailed comparative analysis of the performance of four classifiers—Discriminant, Naive Bayes, Multi-Class SVM, and DNN—for the binary classification of anxiety. Evaluation metrics include TPR (recall), FNR, PPV (precision), FDR, F1-score, and overall accuracy across both classes (Anxious and No Anxiety). Among the tested models, the LDA classifier achieved the highest overall accuracy (61.67%) and demonstrated a balanced performance across both classes, with recall values of 62.59% for Anxious and 60.74% for No Anxiety. Naive Bayes showed moderate performance with a slightly higher recall for the No Anxiety class (61.85%) than the Anxious (55.92%), resulting in an overall accuracy of 58.89%. The multi-class SVM model yielded comparable performance (59.26% overall accuracy), with a recall of 61.85% for Anxious and 56.66% for No Anxiety. The DNN, while offering more balanced metrics between the two classes (recall: 58.88% and 57.40%), had the lowest overall accuracy (58.15%) among the models. These findings suggest that although none of the models reached high classification precision, the LDA and SVM classifiers demonstrated relatively more stable performance for anxiety detection across evaluation criteria.

Table 7 Performance metrics of four classification models (Discriminant, Naive bayes, Multi-Class SVM, and DNN) in the binary classification of anxiety, evaluated based on recall (TPR), false negative rate (FNR), precision (PPV), false discovery rate (FDR), F1-score, and overall accuracy for both anxious and no anxiety classes.

Comparing methods

The performance of the anxiety classification model was comparatively lower than that of the depression model, as reflected in its reduced accuracy and efficiency. This discrepancy can be attributed to a smaller number of anxiety-labeled trials and an imbalanced distribution of subjects across anxiety categories. In contrast, the depression dataset included more observations with more evenly distributed class labels, facilitating more reliable model training. Additionally, gait features appeared to be less sensitive to anxiety-related behavioral changes compared to depressive symptoms, resulting in reduced discriminative power. Despite implementing data augmentation techniques to mitigate the effects of data imbalance and limited sample size, the anxiety classification model still showed approximately 20% lower performance than the depression model.To contextualize the proposed model’s performance, a comparative overview of related research using Kinect-based gait analysis for mood disorder recognition is presented in Table 8. Previous studies have employed similar methods for detecting various affective states, such as depression, anxiety, and emotional polarity. The results suggest that our proposed model demonstrates competitive accuracy, particularly given the use of a single, non-invasive sensing device (Microsoft Kinect) and a relatively small dataset. Overall, this study introduced a novel framework for predicting and classifying depression and anxiety levels using gait features, without the need for self-reported questionnaires. The findings support the feasibility of recognizing mental health conditions through movement analysis in young individuals, offering potential for real-time, automated screening applications.

Table 8 Summary of recent studies using Kinect-based gait analysis and machine learning for mood disorder recognition, comparing classification accuracies across different mental health conditions.

Discussion

This study explored the association between gait characteristics and psychological conditions, specifically anxiety and depression, in young adults, aiming to develop non-invasive models capable of detecting and classifying these disorders through machine learning techniques. Utilizing Microsoft Kinect for motion capture, a range of spatiotemporal gait features were extracted and analyzed using various classifiers, including DNNs, SVM, Naive Bayes, and LDA. The findings affirm the viability of using gait as a behavioral biomarker for mental health assessment.

Consistent with previous research linking affective disorders to motor alterations, this study demonstrated that several gait parameters—including step speed, step length, and gait cycle duration—are modulated by the presence and severity of anxiety and depression. Notably, step speed declined more sharply in anxiety than in depression, while step width varied in opposite directions for the two conditions, potentially offering discriminative value. The results indicated stronger classification performance for depression, with DNNs achieving an accuracy of up to 88.3% for severe cases. In contrast, anxiety classification proved more challenging, likely due to subtler behavioral manifestations and class imbalance within the dataset.

One of the key contributions of this work lies in proposing a passive and objective framework for mental health evaluation that reduces reliance on subjective self-report questionnaires. Furthermore, the robustness and generalizability of the models were improved by applying data augmentation and hyperparameter tuning. Although the study was conducted in a controlled environment, its implications extend to potential real-world applications, such as telehealth-based monitoring and clinical screening tools. The scalable approach may be adapted for continuous, remote assessment through wearable sensors or smartphone-based platforms.

Conclusion

The present study provides evidence that gait analysis, facilitated by affordable and accessible technologies like Microsoft Kinect, can serve as a promising tool for the non-invasive screening and monitoring of anxiety and depression. Through machine learning algorithms, it is possible to objectively detect changes in motor behavior associated with these disorders, offering an alternative or complementary method to traditional clinical assessments. Importantly, the study lays the groundwork for developing real-time, home-based software systems to support early diagnosis and long-term mental health management.

Limitations

Despite the promising results, several limitations must be acknowledged. First, the sample size was relatively small, especially in the severe anxiety and depression subgroups, limiting the statistical power and generalizability of the findings. Second, the exclusion of participants with musculoskeletal disorders may have inadvertently filtered out individuals with severe psychological conditions, introducing selection bias. Third, the data were collected at a single time point in a controlled laboratory environment, which may not fully reflect natural walking behaviors influenced by emotional or contextual factors. Moreover, the imbalance in class distribution, particularly in anxiety-related categories, restricted the model’s ability to generalize across all severity levels.

Future work

Future studies should address these limitations by recruiting more clinically diverse samples, including participants with comorbid psychological and physical conditions. Longitudinal research designs are needed to evaluate how gait patterns evolve in response to changes in mental health. Additionally, expanding data collection to naturalistic settings—such as outdoor walkways, public spaces, or clinical environments—will enhance the ecological validity of the findings. Integrating gait analysis with other physiological and behavioral signals (e.g., heart rate variability, speech features, or electrodermal activity) may further improve the diagnostic accuracy and specificity of the models.

Moreover, future work should explore integrating advanced deep learning methods, such as recurrent neural networks (e.g., LSTM) or reinforcement learning frameworks, to better capture temporal dependencies and personalized behavioral patterns. Ethical considerations should also be addressed to ensure responsible deployment in real-world clinical or telehealth applications, including privacy, consent, and data security.