Abstract
This research presents an innovative model for automatically detecting yoga poses, specifically focusing on the yogic kriya known as “Shankha Prakshalana.” The proposed system utilizes advanced computer vision techniques to extract pose features from yoga videos for classification purposes. A carefully annotated dataset comprising videos of individuals practicing Shankha Prakshalana was used to train and evaluate various machine learning (ML) architectures, incorporating both supervised and unsupervised learning algorithms. Among the evaluated models, the Random Forest classifier demonstrated superior performance, achieving a remarkable recognition rate of 99.6%. This research significantly contributes to the integration of computer vision and yoga practice, offering potential applications that bridge traditional yogic techniques with modern technology. The developed system could have advantages for monitoring and improving yogic practices in various environments, marking a notable advancement in the field of automated pose detection systems.
Similar content being viewed by others
Introduction
The impact of COVID-19 has resulted in the deaths of millions of people globally. Both heart disease and stroke are serious public health issues; the former impairs mobility and is a leading contributor to disability. Mental health issues are becoming more prevalent; millions of individuals globally suffer from depression. In today’s environment, obesity, sedentary lifestyles, and poor nutrition are the primary causes of many health problems1.
Furthermore, there is no doubt about the impact that computers and computer-powered technologies have on healthcare and related fields, as well as every other field. Yoga, Zumba, martial arts, and other pastimes are commonly recognised as means to improve one’s health, as are routine medical procedures. With origins in ancient India, yoga is a broad category of activities aimed at improving a person’s physical, mental, and spiritual well-being2.
Integrating technology into yoga practices can benefit from the use of artificial intelligence tools such as Pose-Net and Mobile-Net SSD, as well as human posture detection. In the field of Human Computer Interaction (HCI), identifying the human body presents a big issue3,4. It is frequently used for a variety of purposes, including as daily tasks, yoga, sports, and more. A key topic in computer vision is human posture estimation, which has applications in behaviour analysis, intelligent driver assistance systems, assisted living5and visual surveillance. Since the emergence of deep neural networks, pose estimation performance has significantly improved. Computer vision technology is employed to standardize and correct yoga postures. It’s important to perform yoga poses correctly to avoid injuries and long-term complications6. Studying human posture can help detect and address abnormal positions, enhancing overall well-being at home7.
Despite the relatively small number of qualified professional yoga instructors, yoga is a popular physical exercise with a big global following. Self-study, such as mechanically mimicking yoga motions from instructional films, is the only way for most yoga beginners to learn the practice. Technically speaking, the learner finds it difficult to precisely notice the minute features of their full body posture with the existing method. This is due to the fact that many poses require the student to direct their sight in a particular direction. As a result, this restriction causes the learning process to become less effective. As a result, identifying and assessing yoga poses is essential for offering direction for independent study.
Developing and implementing a Shankhaprakshalana yoga position detection system that can efficiently identify and track yoga postures is the main goal of this research work. Our goal is to offer a novel framework that incorporates posture features to accomplish accurate and robust yoga pose detection by thoroughly analysing existing literature and approaches using a variety of machine learning (ML) algorithms. The significance of this work lies in its ability to link technology with yoga practice.
Related works
In the field of pose detection8,9 and computer vision, recent research efforts have contributed significantly to the advancement of methodologies and models10,11. Table 1 provides a summary of reviewed works relevant to the current study. Poselet-conditioned pictorial structures have demonstrated enhanced precision in human pose estimation12 while the integration of YOLO V4 has proven effective for object detection, particularly in discerning yoga postures with complex spatial relationships13. Multi-person yoga pose estimation has seen notable progress14,15,16,17 with the utilization of part affinity fields18. The work of19 introduces a novel approach to pose estimation from image sensor data, showcasing advancements in real-time human pose recognition. Furthermore, robust 3D pose has presented estimation techniques20, highlighting the potential for accurate spatial understanding. Pose recognition from depth images has been addressed by21, showcasing the feasibility of real-time recognition in dynamic yoga environments. The exploration of articulated pose estimation models and convolutional pose networks22,23 has contributed to the evolution of 3D human pose estimation methods. Temporal convolutional networks24 have been pivotal in advancing real-time yoga recognition by capturing spatial-temporal features. The integration of improved YOLO-V3 models has demonstrated success in object detection applications within agriculture and surveillance contexts25.
Research gap & motivation
Therapeutic technologies improve mobility by reducing impairments at the body structure/function level, aiding the body in repairing or addressing structural impairments, and supporting rehabilitation of impaired body function. As opposed to therapeutic technologies, assistive technologies are intended for use in the home and community to facilitate the execution of functional tasks and are operated by the user rather than a clinician26. As a result, the emphasis of this study is the prediction of shankha prakshalana yoga poses that will assist the individual in resolving bowel movement issues. Shankha Prakshalana is a yoga asana that is renowned for its numerous health advantages. However, achieving mastery and proficiency in executing the pose with precision necessitates appropriate guidance and adherence to proper form. An automated system for recognising yoga poses can serve as a valuable tool in delivering real-time feedback and ensuring the safety of practitioners.
Novelty and scope
The objective of this study is to create a comprehensive framework for pose estimation utilising artificial intelligence methods to assist individuals in executing Shankha Prakshalana kriya with accurate posture. The main goal is to mitigate injuries and enhance the quality of human exercise by utilising a computer and camera system. The proposed system employs an innovative methodology for posture detection and correction utilising convolutional neural network models. The system’s performance is assessed using both supervised and unsupervised learning techniques to determine the most superior optimal model for the Shankha Prakhshalna yogic kriya.
Significant contribution and outline
The noteworthy contributions of the research presented in this work are enumerated as follows.
-
A comprehensive dataset is proposed on Shankha Prakhhshlana kriya, specifically designed for the purpose of yoga pose recognition. The dataset comprises a total of 77 RGB videos, which have been categorised into 11 distinct groups.
-
A customised pipeline utilising supervised ML methodology is proposed, which incorporates a Random Forest classifier. This integration aims to enhance the precision and resilience of yoga pose detection.
-
A comparison is made between the proposed pipeline and various supervised learning algorithms, including k-NN (k-Nearest Neighbour), SVM (State vector Machines), and different clustering techniques.
-
The proposed dataset is being applied to both supervised and unsupervised ML algorithms to determine the optimal performance model for Shankha Pralakshna Kriya.
Section 2 provides a comprehensive analysis of the proposed Yoga Pose Recognition System. Section 3 presents the results obtained from the conducted experiments. The findings are subjected to analysis in Section 4. Section 5 provides a comprehensive summary of the conclusion.
Proposed yoga pose recognition system
The acquisition of the yoga pose detection dataset and the extraction of pose features are described in this section. The pipeline to generate a recognition system to detect Shankha Prakhshalana kriya yoga poses is depicted in Fig. 1.
Yoga pose dataset collection and preprocessing
The proposed dataset for yoga pose detection was meticulously collected to ensure diversity and representatives. Seven 15-second RGB videos were recorded for each of the 11 yoga poses relevant to the “Shankha Prakshalana” Kriya. This resulted in a total of 77 videos in the dataset (77 video.mov). To maintain consistency and standardization, all videos were recorded with a smartphone. The recorded videos were then processed using FFmpeg to convert them into uncompressed MKV format (77 video.mkv). Each video has a resolution of 1080x720 pixels. Samples of the different classes are presented in Fig. 2.
Data augmentation and feature extraction
The videos undergo an initial augmentation process using the vidaug library, which is a Python package designed to enhance videos for deep learning systems. The process converts the received videos into a novel and significantly larger compilation of slightly modified videos. Preprocessing techniques such as rotation, variable tilting, and frame blurring are employed to enhance the quality of videos. One of the notable improvements that could be beneficial for gesture recognition is the implementation of Gaussian blur. Gaussian blur is a widely used technique to mitigate visual noise. It is a type of image-blurring filter that utilises a Gaussian function to determine the appropriate change to be applied to each pixel in the image. (b) Local Contrast Normalisation: This standardisation technique implements local competition between neighbouring features in a feature map and between features in the same spatial location across multiple feature maps.
RGB video data is utilised as input for the purpose of extracting skeletal features. The MediaPipe Hands feature developed by27 utilises skeletal characteristics to estimate hand postures. The software is a freely available framework designed for building pipelines that handle sequential data, specifically video and audio. High-fidelity hand tracking is accomplished through the utilisation of ML algorithms to predict 21 3D key points of a hand from a single image. The keypoints are stored in the numpy format for convenient input into the model. The pseudo code for pose extraction and augmentation, as shown in Algorithm 1, provides a concise explanation of this process.
Proposed methodology
In this section, we compare different ML techniques using a proposed dataset for both supervised and unsupervised learning. We evaluate the performance of these techniques using various evaluation metrics. The ML techniques are K-NN , SVM , and RF (Random Forest). Deep learning techniques, specifically neural network-based algorithms, have gained significant popularity in recent years. However, they have been observed to exhibit overfitting issues when applied to the Shankapralashna dataset. While accuracy is widely recognised as the primary performance metric in activity recognition, we also incorporated supplementary metrics like precision, recall, and F1 score28.
K-nearest neighbour
The k-nearest neighbours (k-NN) algorithm classifies a given data sample x using a pre-existing training set X. It calculates the distance between each point and the prediction point whose prediction we need to make. The algorithm chooses the label that has the highest weight among the k closest training samples. The application of a distance-based weighting technique can be employed to amplify the influence of neighbouring points on the final label prediction29.
State vector machines
This algorithm is used to generate one or more decision boundaries in the n-dimensional input feature space. It ensures that the distance to the closest samples of each label is maximal. This necessitates the data to exhibit linear separability. If the dataset exhibits non-linear separability, it is possible to transform the training data into a higher-dimensional space with N dimensions (N>n) and identify an optimal hyperplane within that space. Nevertheless, this particular projection can incur significant computational costs. The SVM algorithm employs a kernel trick to mitigate this issue. Using a kernel function instead of directly projecting the data points into a higher-dimensional space allows one to find an ideal decision boundary. In the N-dimensional space, this kernel function characterises the dot-product of the data points.
Random forest
The random forest (RF) algorithm is a ML algorithm in the field of ensemble learning. Therefore, multiple ML models, specifically decision trees, are utilised to forecast the labels of novel input data. The final prediction of the Random Forest (RF) is determined by the majority label of the predictions made by the weak classifiers. Furthermore, the process of random feature selection or subsampling is executed during the training phase. Hence, the training process for each decision tree involves selecting a subset of input features. This approach aims to reduce the correlation between decision trees and enhance their ability to generalise. Moreover, it is possible to enhance the performance of each weak classifier by training it on a single subset of randomly selected samples [68]. This methodology is commonly referred to as bootstrapping.
Results
In this section we describe proposed network training and evaluates its performance.
Experimental setup
The experimental setup was performed on a local machine called DESKTOP-OHD4ICG, which is equipped with an Intel(R) Core(TM) i7-9750H CPU running at a frequency of 2.60GHz (2.59 GHz) and 16.0 GB of installed RAM (15.9 GB usable). The system functions on a 64-bit operating system that utilises a processor based on the x64 architecture.
In order to conduct our experimentation, we have devised a ML model with the objective of detecting and classifying yoga poses. The system utilised a combination of supervised and unsupervised learning techniques. Furthermore, we implemented data augmentation methodologies to enhance the quality and depth of our dataset. Once the dataset was divided into training and testing subsets, the dimensions obtained were (3548, 15, 99) for the input data and (3548, 12) for the corresponding labels in the training set. In the testing set, the dimensions were (887, 15, 99) for the input data and (887, 12) for the corresponding labels. The dataset consisted of 11 unique categories of yoga poses.
During the course of the experimentation, our primary objective was to attain a high level of reliability in the task of pose identification. This study employed a combination of supervised and unsupervised learning methods, as well as data augmentation, to improve the model’s capacity to generalise across different yoga poses. The experimental framework was developed to enable comprehensive assessment and verification of the model’s effectiveness in accurately identifying yoga poses.
Performance analysis for supervised learning
The comparison between different supervised learning algorithms over the proposed dataset is depicted in Table 2 This study aimed to assess the effectiveness of three different classification algorithms, namely k-NN, SVM, and RF, on video data. The video data consisted of different sequence lengths, specifically 5, 10, 15, 20, 25, and 30 frames. The algorithms were chosen based on their various mechanisms for handling classification tasks. kNN is a straightforward, instance-based learner, SVM is known for its ability to handle high-dimensional spaces, and RF is recognised for its ensemble method, which provides excellent precision and versatility. We utilised various sequence lengths to ascertain the most suitable temporal resolution for each algorithm, thereby improving our comprehension of how the level of detail in video data affects the performance of classification.
The findings demonstrate that RF consistently exhibited superior performance compared to k-NN and SVM across all sequence lengths with accuracy of 99.66% when applied to sequence lengths of 15 and 20 frames. In the context of sequence length 5, the accuracies achieved by kNN, SVM, and RF were 98.6%, 98.9%, and 99.2%, respectively. The accuracies achieved at sequence length 10 were 98.4%, 98.3%, and 98.6%. Both k-NN and SVM achieved a performance of 99.2% at a sequence length of 15. In contrast, random forest (RF) reached its highest performance of 99.6%. At a sequence length of 20, the k-NN and SVM algorithms achieved a performance of 98.9% and 98.6%, respectively. RF achieved accuracies of 99.4% and 99.3% for sequence lengths 25 and 30, respectively. In contrast, k-NN and SVM exhibited slightly lower performances.
The performance of RF is represented in the Fig. 3 which is due to its ensemble learning methodology, which improves the accuracy and resilience of predictions, its capacity to efficiently process a substantial quantity of input features, and its ability to withstand noise and variability in video data. On the other hand, the instance-based approach of kNN and the requirement for extensive parameter tuning in SVM are likely factors that contributed to their relatively lower performance. The results of this study indicate that Random Forest (RF) is the most efficient algorithm for video classification tasks. It exhibits strong performance and accuracy when applied to different temporal resolutions.
The confusion matrix presented in Fig. 4 showcases the remarkable performance of the multi-class classification model when applied to 11 different classes. Each row in the dataset corresponds to the true class, whereas each column corresponds to the predicted class. The diagonal components of the matrix represent the count of accurate predictions for each class, indicating that the model has attained a high level of accuracy, as most of the predictions are concentrated on the diagonal. The model successfully classified 81 instances of class 0, 82 instances of class 1, 91 instances of class 2, 88 instances of class 3, 73 instances of class 4, 80 instances of class 5, 77 instances of class 6, 76 instances of class 7, 70 instances of class 8, 75 instances of class 9, and 91 instances of class 10. The occurrence of misclassifications is negligible, as there is only one instance in class 4 that is misclassified as class 3, one instance in class 6 that is misclassified as class 5, and one instance in class 7 that is misclassified as class 6. This yields a total accuracy of 100%, demonstrating the model’s resilience and dependability in precisely categorising the instances. The exceptional classification performance showcases the model’s efficacy and its potential suitability in real-world situations where precise accuracy is of utmost importance. The obtained results demonstrate the model’s ability to effectively manage intricate multi-class classification tasks while minimising errors. This makes it a valuable asset for future research and practical implementation.
Fig. 5 denotes the different evaluation metrics for the Random Forest classifier algorithm. The x-axis denotes the class labels, representing the categories in the classification task, with values ranging from 1 to 11. Three sets of bars are generated for each class: the first bar representing precision, the second bar representing recall, and the third bar representing the F1-score. The numerical values of each metric span from 0 to 1, and the vertical position of the bars corresponds to the corresponding values. Classifications that exhibit bars in close proximity to 1 for all three metrics demonstrate exceptional performance. The presence of slight discrepancies among the bars can indicate discrepancies in the model’s performance, such as a lower precision for class 5 in comparison to its recall and F1-score. This suggests that there may be occasional instances of false positive predictions. In general, the graph offers a concise and visual representation of the model’s performance across various classes, facilitating the identification of both its strengths and areas requiring enhancement.
Performance analysis for unsupervised learning
In this section, we chose to utilise three different clustering algorithms in order to analyse video data as depicted in Table 3. These algorithms include Agglomerative Clustering, Gaussian Mixture Model (GMM), and K-Means clustering. The justification for this decision is based on the variety of clustering approaches employed by these algorithms, each providing distinct viewpoints and methodologies for grouping data points that are similar. Agglomerative Clustering is a method that iteratively combines the closest data points to create clusters. On the other hand, GMM models the data as a combination of Gaussian distributions. Lastly, K-Means partitions the data into k clusters based on the proximity of centroid points. Through the utilisation of these three algorithms, our objective was to thoroughly investigate the clustering structure of the video data and evaluate their efficacy in revealing significant patterns.
Additionally, we manipulated the duration of the video sequences, specifically 5, 10, 15, 20, 25, and 30 frames, throughout our analysis. This intentional variation was implemented to examine the impact of video data temporal granularity on the performance of each clustering algorithm. Diverse sequence lengths are employed to capture different degrees of temporal information. Through the analysis of clustering outcomes across these lengths, our objective was to determine the most suitable temporal resolution for each algorithm.
The analysis yielded significant findings regarding the clustering performance of the three algorithms across various sequence lengths represented in Fig. 6. The K-Means clustering algorithm consistently demonstrated superior performance compared to Agglomerative Clustering and GMM. It achieved higher silhouette scores, which are a quality measure for clustering, for each sequence length. The observed consistency highlights the resilience and efficiency of K-Means clustering in dividing the video data into interconnected and clearly defined clusters. Significantly, the K-Means algorithm demonstrated its maximum silhouette score of 0.2888 when the sequence length was configured to 5 frames. This suggests that shorter sequences may yield superior clustering outcomes for this algorithm.
To the best of my knowledge, there has been no effort made to categorise the poses of shanka prakshalna yoga. Therefore, I was unable to locate a work that is identical to mine, but I did compare it with works that are similar to ours. Based on the data presented in the Table 4, it is evident that the proposed work is achieving the highest level of accuracy compared to all other state-of-the-art works.
Discussions
Supervised learning
The supervised learning model consistently achieves high accuracy scores, ranging from 0.9842 to 0.9966, which indicate consistent performance across various sequence lengths. In order to achieve the highest accuracy scores across all sequence lengths, Random Forest (RF) consistently outperforms k-Nearest Neighbours (kNN) and Support Vector Machine (SVM). RF’s dominance in accurately identifying video data is evident even when the sequence length is altered, demonstrating its durability and efficacy.
The reason behind how RF is performing the best among all other models is, it classifies pre-extracted features, which are typically structured keypoint data (e.g., X, Y coordinates of joints such as the shoulders, elbows, and knees) that are derived from a pre-trained Human Pose Estimation (HPE) feature class. The complex vision task is effectively offloaded to the HPE model through this procedure, which provides the RF with a simplified, low-dimensional, and geometrically rich feature vector that is highly relevant to pose classification. These findings highlight the significance of utilising supervised learning algorithms, particularly Random Forest (RF), for tasks that involve the availability of labelled data and the need for high accuracy.
However, it is crucial to acknowledge the limitations stemming from the current dataset size which is only 77 RGB videos, while sufficient for initial validation, is relatively small and lacks high variability in backgrounds and subject demographics
Unsupervised learning
On the other hand, the silhouette scores of unsupervised learning algorithms such as Agglomerative Clustering, GMM, and K-Means clustering exhibit different levels of performance when considering different quantities of clusters and sequence lengths. In the majority of configurations, K-Means consistently demonstrates superior performance compared to Agglomerative Clustering and GMM, as evidenced by its higher silhouette scores. This implies that the K-Means algorithm exhibits superior performance in partitioning the data into discrete clusters, irrespective of the quantity of clusters or the length of the sequence. In comparison to supervised learning, unsupervised learning exhibits lower silhouette scores, suggesting that supervised learning algorithms typically outperform unsupervised learning in accurately classifying video data. This yoga position detection system’s great accuracy makes it useful in three key practical fields. Tele-Yoga/Remote Instruction allows for the delivery of real-time, automatic feedback on student alignment, ensuring the quality and safety of virtual classes. For Physical Rehabilitation, the model acts as an objective monitoring tool to verify that patients maintain proper posture throughout therapeutic exercises, hence reducing injury risk and increasing treatment success. Finally, Advanced Fitness Monitoring extends beyond basic activity tracking to deliver quantitative, granular feedback on form and technique, raising the bar for personalised digital fitness coaching. Collectively, these applications demonstrate the system’s ability to transform from a scientific model to an influential, adaptable health and wellness tool.
Real-time feasibility is an important part of a yoga position identification system’s real-world implementation. Our chosen pipeline, which uses pre-extracted, low-dimensional keypoint data classified by a Random Forest (RF) model, is naturally optimised for this purpose. Unlike complex deep learning architectures (such as 3D CNNs or massive Transformers), which require high computing resources, the RF classifier offers extraordinarily fast inference speeds and a tiny memory footprint, making it perfect for deployment on edge devices.
Conclusion and future work
The present study presents a new pipeline for the recognition of Shankha Prakashna Kriya. Based on the results and analysis, the comparison demonstrates that supervised learning outperforms unsupervised learning in the domain of video data classification. Supervised learning algorithms, such as Random Forest, consistently demonstrate their effectiveness in handling labelled video data and producing accurate classifications, as evidenced by their higher accuracy scores. However, unsupervised learning algorithms such as K-Means clustering exhibit limited effectiveness in clustering video data, as evidenced by silhouette scores, in comparison to supervised learning. The results highlight the significance of utilising pose extracted data and supervised learning techniques, for precise yoga pose classification , particularly in fields like video analysis where achieving high accuracy is crucial. Moreover, the exploration of exploratory possibilities lies in the development of user-friendly applications that utilise the model for personalised yoga training, which can adapt to individual skill levels and progress. Future research will focus on extending the robustness and generalizability of the proposed pipeline. The current high performance, while promising, is measured against a single-subject or low-diversity dataset. To reduce the danger of overfitting and poor generalisation in real-world settings, a main goal will be to collect and integrate a bigger, more diverse multi-person yoga dataset.The existing reliance on 2D keypoints is intrinsically constrained by perspective, making geometrically distinct stances appear identical. As a result, incorporating 3D skeleton elements is an important area for expansion. More improved Human Pose Estimation (HPE) models or depth-sensing cameras will be used to record the joints’ genuine three-dimensional coordinates.
Data availability
The data supporting the findings of this study are not publicly accessible due to sensitivity concerns. However, interested individuals can obtain the data from the corresponding author by making a reasonable request.
Abbreviations
- ML:
-
Machine Learning
- SP:
-
Shankha Prakshalana
- HCI:
-
Human Computer Interaction
- RF:
-
Random Forest
References
Rajendran, A. K. & Sethuraman, S. C. A survey on yogic posture recognition. IEEE Access 11, 11183–11223 (2023).
Upadhyay, A. & Basha, N. K., Ananthakrishnan, B. Deep learning-based yoga posture recognition using the y_pn-mssd model for yoga practitioners. In: Healthcare 11, 609 (2023). MDPI
Mahmud, H., Morshed, M. M. & Hasan, M. K. Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition. Vis. Comput. 40(1), 11–25 (2024).
Safa aldin, S., Aldin, N. B. & Aykac, M. Enhanced image classification using edge cnn (e-cnn). Vis. Comput. 40(1), 319–332 (2024).
Dias, P.A., Malafronte, D., Medeiros, H. & Odone, F. Gaze estimation for assisted living environments. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 290–299 (2020).
Narayanan, S.S., Misra, D.K., Arora, K. & Rai, H. Yoga pose detection using deep learning techniques. In: Proceedings of the International Conference on Innovative Computing & Communication (ICICC) (2021).
Kumar, D. & Sinha, A. Yoga Pose Detection and Classification Using Deep Learning. LAP LAMBERT Academic Publishing (2020).
Kamel, A., Liu, B., Li, P. & Sheng, B. An investigation of 3d human pose estimation for learning tai chi: A human factor perspective. Int. J. Hum.-Comput. Interact. 35(4–5), 427–439 (2019).
Hu, X., Bao, X., Wei, G. & Li, Z. Human-pose estimation based on weak supervision. Virtual Reality & Intelligent Hardware 5(4), 366–377 (2023).
Ali, A., Shahbaz, H. & Damaševičius, R. xcvit: Improved vision transformer network with fusion of cnn and xception for skin disease recognition with explainable AI. Comput. Mater. Contin.83(1) (2025)
Toor, M. S. et al. An optimized weighted-voting-based ensemble learning approach for fake news classification. Mathematics 13(3), 449 (2025).
Bharadwaj, B. Role of yoga and mindfulness in severe mental illnesses: A narrative. Int. J. Yoga 12(1) (2019)
Gao, Z., Zhang, H., Liu, A. A., Xu, G. & Xue, Y. Human action recognition on depth dataset. Neural Comput. Appl. 27, 2047–2054 (2016).
Sharma, A., Sharma, P., Pincha, D. & Jain, P. Surya namaskar: real-time advanced yoga pose recognition and correction for smart healthcare. arXiv preprint arXiv:2209.02492 (2022).
Sharma, A., Agrawal, Y., Shah, Y. & Jain, P. iyogacare: real-time yoga recognition and self-correction for smart healthcare. IEEE Consumer Electronics Magazine (2022).
Agrawal, Y., Shah, Y. & Sharma, A. Implementation of machine learning technique for identification of yoga poses. In: 2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT), pp. 40–43 (2020). Ieee
Sharma, A., Shah, Y., Agrawal, Y. & Jain, P. Real-time recognition of yoga poses using computer vision for smart health care. arXiv preprint arXiv:2201.07594 (2022).
Connaghan, D., Kelly, P., O’Connor, N.E., Gaffney, M., Walsh, M. & O’Mathuna, C. Multi-sensor classification of tennis strokes. In: SENSORS, 2011 IEEE, pp. 1437–1440 (2011). IEEE
Pai, P.-F., ChangLiao, L.-H. & Lin, K.-P. Analyzing basketball games by a support vector machines with decision tree model. Neural Comput. Appl. 28, 4159–4167 (2017).
Bai, L., Efstratiou, C. & Ang, C. S. wesport: Utilising wrist-band sensing to detect player activities in basketball games. In: 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops), pp. 1–6 (2016). IEEE
Shan, C.Z., Ming, E.S.L., Rahman, H.A. & Fai, Y.C. Investigation of upper limb movement during badminton smash. In: 2015 10th Asian Control Conference (ASCC), pp. 1–6 (2015). IEEE
Wang, C., Wang, Y., Lin, Z., Yuille, A.L. & Gao, W. Robust estimation of 3d human poses from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2361–2368 (2014).
Cao, Z., Simon, T., Wei, S.-E. & Sheikh, Y. Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017).
Yadav, S. K., Singh, A., Gupta, A. & Raheja, J. L. Real-time yoga recognition using deep learning. Neural Comput. Appl. 31, 9349–9361 (2019).
Tian, Y. et al. Apple detection during different growth stages in orchards using the improved yolo-v3 model. Comput. Electron. Agric. 157, 417–426 (2019).
Cowan, R. E. et al. Recent trends in assistive technology for mobility. J. Neuroeng. Rehabil. 9(1), 1–8 (2012).
Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.-L. & Grundmann, M. Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020).
Ambati, L. S. & El-Gayar, O. Human activity recognition: a comparison of machine learning approaches. J. Midwest Association Inf. Syst. (JMWAIS) 2021(1), 4 (2021).
Logacjov, A., Bach, K., Kongsvold, A., Bårdstu, H. B. & Mork, P. J. Harth: A human activity recognition dataset for machine learning. Sensors 21(23), 7853 (2021).
Ashraf, F. B., Islam, M. U., Kabir, M. R. & Uddin, J. Yonet: A neural network for yoga pose classification. SN Comput. Sci. 4(2), 198 (2023).
Swain, D. et al. Deep learning models for yoga pose monitoring. Algorithms 15(11), 403 (2022).
Acknowledgements
Not Applicable
Funding
Open access funding provided by Manipal University Jaipur. Not applicable
Author information
Authors and Affiliations
Contributions
All authors contributed equally in designing, editing and working on paper and related research work. AS Conceptualization, VS and PS contributed to the structuring and visualization of the research process, data interpretation, data visualization and manuscript refinement. RL and YL collected and analyzedthe data. AS and SB supervised the project. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Consent to participate
Informed consent was obtained to publish the images in an online open-access publication from all individual participants included in the study.
Consent for publication
I hereby provide consent for the publication of the manuscript detailed above.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sharma, A., Sharma, V., Sharma, P. et al. Automated identification of Shankha Prakshalana yoga poses with machine learning techniques. Sci Rep 16, 430 (2026). https://doi.org/10.1038/s41598-025-29984-2
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-29984-2









