Surface and underwater human pose recognition based on temporal 3D point cloud deep learning

Wang, Haijian; Wu, Zhenyu; Zhao, Xuemei

doi:10.1038/s41598-023-50658-4

Download PDF

Article
Open access
Published: 02 January 2024

Surface and underwater human pose recognition based on temporal 3D point cloud deep learning

Haijian Wang¹,
Zhenyu Wu¹ &
Xuemei Zhao²

Scientific Reports volume 14, Article number: 55 (2024) Cite this article

2945 Accesses
11 Citations
Metrics details

Subjects

Abstract

Airborne surface and underwater human pose recognition are crucial for various safety and surveillance applications, including the detection of individuals in distress or drowning situations. However, airborne optical cameras struggle to achieve simultaneous imaging of the surface and underwater because of limitations imposed by visible-light wavelengths. To address this problem, this study proposes the use of light detection and ranging (LiDAR) to simultaneously detect humans on the surface and underwater, whereby human poses are recognized using a neural network designed for irregular data. First, a temporal point-cloud dataset was constructed for surface and underwater human pose recognition to enhance the recognition of comparable movements. Subsequently, radius outlier removal (ROR) and statistical outlier removal (SOR) were employed to alleviate the impact of noise and outliers in the constructed dataset. Finally, different combinations of secondary sampling methods and sample sizes were tested to improve recognition accuracy using PointNet++. The experimental results show that the highest recognition accuracy reached 97.5012%, demonstrating the effectiveness of the proposed human pose detection and recognition method.

An advanced three stage lightweight model for underwater human detection

Article Open access 25 May 2025

Multi-modal sensor fusion towards three-dimensional airborne sonar imaging in hydrodynamic conditions

Article Open access 25 April 2023

Adaptive fusion based deep learning framework for restoring underwater image quality using multi scale attention features

Article Open access 29 January 2026

Introduction

Drowning is the third leading cause of unintentional injury deaths worldwide and one of the top ten major causes of death among children and young adults¹. One of the primary factors contributing to drowning incidents is the lack of lifeguards or insufficient supervision by lifeguards in most open-water areas. However, because of the vast expanse of water bodies globally, it is not feasible to assign lifeguards to supervise them all. Therefore, the development of an automated monitoring system for drowning prevention is urgently required. Surface and underwater human pose recognition is key to the development of such an automated monitoring system.

Researchers have conducted a series of studies on the recognition of surface and underwater human poses, which are divided into two main categories. The first method involves monitoring drowning incidents through wearable sensors, which require individuals to wear sensor devices. Monitoring of signals such as pressure², heart rate³, acceleration⁴, inertia⁵, and position⁶ can indicate a drowning situation. However, this method is costly and can restrict the movements of swimmers because of the requirement to wear sensor devices, potentially leading to drowning incidents, which does not align with the original research intention. Considering these issues, methods based on image or video recognition have been proposed, including background subtraction⁷, hue saturation value (HSV)⁸, k-means clustering algorithm⁹, and deep learning^10,11,12. While recognizing postures through videos or images resolves the issues associated with wearable sensors, the use of airborne optical cameras poses challenges in simultaneously imaging both the water surface and underwater because of the influence of visible light wavelengths. Furthermore, the lack of spatial geometric information in images can lead to misjudgment and misdetection of poses.

LiDAR technology has demonstrated advantages in addressing the challenges of image-based recognition of water surfaces and underwater human poses. LiDAR possesses strong penetration capabilities because laser beams can penetrate the water surface and generate dense 3D point clouds, providing detailed 3D information about the water surface and underwater human poses. Unlike RGB images, 3D point clouds contain both geometric and spatial information. To fully utilize this information, Li et al.¹³ first conducted research on human pose recognition using depth image sequences. This approach involves sampling points from the depth image and encoding the obtained point-cloud model to extract spatial information regarding the human pose. However, this method of computing normal vectors and constructing feature descriptors in the spatiotemporal domain relies on predefined shape descriptors or manually extracted features, and often fails to fully express the complexity and richness of point cloud data.

Compared to traditional machine learning methods, deep learning approaches are better suited for handling 3D point-cloud data and possess stronger feature learning and expression capabilities. In the context of deep-learning-based methods for 3D point clouds, researchers have utilized the point cloud data generated by millimeter-wave radar, along with information such as distance, Doppler, and micro-Doppler, to extract features related to human poses. They employ deep learning neural networks, such as convolutional neural networks (CNN)¹⁴, long short-term memory (LSTM)¹⁵, and 3D residual network (Res3D)¹⁶, and further utilize voxelization to address the sparsity and nonuniformity of point cloud data, to research human pose recognition. However, because of the sparsity of the point-cloud data generated by LiDAR, where data samples are far fewer than the number of voxels, LiDAR point-cloud data voxelization significantly increases the system memory and computational requirements, resulting in a large number of ineffective convolution calculations.

With the introduction of PointNet and PointNet++ 3D point cloud deep learning models by Qi et al.^17,18 in 2016 and 2017, respectively, a new and effective approach emerged to address the issues associated with voxelization-based deep learning networks and point cloud disorder in 3D point cloud classification. Researchers^19,20 have utilized the PointNet++ 3D point cloud deep learning method to improve the accuracy of human pose recognition through model enhancement, achieving higher recognition accuracy, which validates the advantages of the PointNet++ network in human pose recognition. However, all studies on PointNet++-based human pose 3D point cloud deep learning have focused on optimizing the network model to improve accuracy while neglecting the processing and optimization of the dataset. Most studies employing a single filtering method utilize the default furthest point sampling by PointNet++ to sample only once, setting the sample size to 1024 or 2048. The choice of the filtering method for water surface and underwater human-pose 3D point clouds necessitates secondary sampling for the methods involved, which needs to be investigated and explored along with the optimal sample size for achieving the highest recognition accuracy.

To address the existing problems and limitations in current research, an experimental simulation platform was established in chapter 2. In chapter 3, three-dimensional point clouds of human body model postures were collected and different methods were applied to preprocess the point clouds. Finally, after temporalizing the point clouds, model training was performed to obtain highly accurate results. Analyzes and compares the accuracy of various methods in chapter 4. This study investigated human pose recognition on water surfaces and underwater environments using the PointNet++ spatiotemporal 3D point-cloud deep learning method. A spatiotemporal point-cloud dataset was constructed to identify human poses in water and underwater scenarios. Subsequently, a series of comparative experiments were designed based on the dataset. The study explored two filtering methods to determine the optimal one, analyzed the necessity of incorporating secondary sampling for the methods involved, and investigated the influence of the sample size on classification accuracy. Finally, from the combinations of the two filtering methods, two secondary sampling methods and five sample sizes were selected, and the combination with the highest accuracy constituted the optimal method for surface and underwater human poses. We have constructed a time point cloud dataset for water surface and underwater human pose recognition, and obtained the optimal combination of methods for water surface and underwater human pose recognition. The highest classification accuracy achieved using PointNet++ is 0.975012. The solutions to these problems provide important guidance for the application of the PointNet++ spatiotemporal 3D point-cloud deep learning method.

Experimental platform setup

Experimental equipment

The experimental setup, as shown in Fig. 1, included a point cloud acquisition device, an aluminum frame support, a water tank, and a computer. The point cloud acquisition device utilizes a structured light binocular acquisition system, with an effective acquisition distance of approximately 1–1.5 m and a field of view of approximately 0.8 * 0.8 m². The aluminum frame support was constructed using 4040 aluminum profiles. The size of the water tank was 0.51 * 0.38 * 0.29 m³. A computer with an i9-11900K @ 3.50 GHz processor and an RTX 3060 graphics card was used.

Experimental objects

The experimental equipment was used to scan objects, thereby collecting two male and female human models of approximately 1:12 scale. The water surface and underwater postures of the two human models were collected with a collection volume of 2400 for each model. The height of the male body model was approximately 150 mm and that of the female body model was approximately 130 mm, as shown in Fig. 2.

Research proposal

To maximally utilize the information within the dataset while minimizing the impact of noise, this study proposes the flowchart shown in Fig. 3, which includes point cloud acquisition, point cloud filtering, point cloud secondary sampling, point cloud input, and model training. Point cloud preprocessing comprises point cloud filtering, point cloud secondary sampling, and sample representation. Point cloud filtering was used to filter the noise and outliers introduced during data collection. Secondary sampling with appropriate sample size was performed to further decrease the impact of noise and outliers. Finally, PointNet++ was utilized to recognize the continuity of the swimming actions.

Point cloud acquisition

To simulate human pose while swimming, a point-cloud dataset was acquired using human models. This dataset included four common swimming postures: breaststroke, butterfly stroke, backstroke, and freestyle, as well as tread water, and drown postures. Considering the continuity of motion in these six poses, the complete process of the six poses was divided into eight segments on average based on time. The middle-frame pose of each segment is shown in Fig. 4.

Through LiDAR scanning of surface and underwater human models, a total of 400 point cloud samples were obtained for each of the four swimming styles, tread water, and drowning. This enabled us to create a well-balanced training dataset. To better capture the swimmer’s continuous movements over time, each pose was divided into an average of eight actions and sampled 50 times to simulate various swimming poses. The corresponding information is summarized in Table 1. Typical samples for each middle-frame pose are shown in Fig. 5.

Table 1 Sample data quantification.

Full size table

As shown in Fig. 6, the original point clouds contain noisy points that do not belong to the human model, such as water surfaces and water tanks. In addition, the raw point cloud has an uneven density and unsmooth edges (such as the part in the rectangular dashed box). These drawbacks affect the subsequent human pose recognition in the following process.

Point cloud filtering

The original point clouds are shown in Fig. 7a. To alleviate the impact of noise outliers (indicated within the elliptical dashed box in Fig. 6), this study first utilized a pass-through filtering method²¹ that employs a z-coordinate threshold to roughly identify data points representing the human model. The threshold of the z-coordinates was set within the range of 1150–1200 mm, as shown in Fig. 7b, to filter out most of the irrelevant noise; however, noise remains near the human model. Accordingly, ROR²² and SOR²³ were selected to remove this noise. The filtering results are shown in Fig. 7c,d, using a 3 mm radius and a threshold of 6 for the ROR and a neighborhood size of 50 points and a standard deviation multiplier of 1.0 for SOR. Comparing these results, the ROR retains more details, whereas the SOR tends to filter out more data points that may not be all noise. The point counts before and after preprocessing are listed in Table 2.

Table 2 Point count before and after preprocessing.

Full size table

Point cloud secondary sampling

In this study, PointNet++¹⁸ was employed to recognize human poses because of its exceptional performance in point-cloud processing. Before feeding the samples into PointNet++, the farthest point sampling (FPS)²⁴ operator was employed to sample points that effectively represent human poses, to ensure uniform input data size. However, the improved filtered point cloud still exhibits some residual noise that may adversely affect the extraction of geometrical structures during FPS, particularly along the edges of human models. Therefore, we incorporated a secondary point-cloud sampling process to address this issue. The effects of spatial sampling (SS)²⁵ and octree sampling (OS)²⁶, using the data after pass-through SOR, are shown in Fig. 8. The SS method was set with a minimum point spacing threshold of 0.5 mm, and the OS method was set with a maximum recursion depth of 8. Comparing Fig. 8a,b, the point cloud density is more uniform with OS, obtaining slightly inferior results with SS. As it is not possible to determine the optimal sampling method directly, the effects of both sampling methods were verified through combined experiments.

Sample representation

The sample size fed into PointNet++ also had a significant impact on the learning ability of the network. The larger the sample size, the more information it contains but the greater is the computational load. To explore the optimal number of input points in surface and underwater human pose point-cloud classification, we chose five different sample sizes: 512, 1024, 2048, 3072, and 4096. As shown in Fig. 9, to fully utilize the continuity of motions representing the same pose, a variable representing the time series was added to the vector of each point. Thus, each point contained seven types of information: x, y, and z coordinates; x, y, and z normals; and the timestamp.

Model training

To address the unordered nature of point clouds, Qi et al.¹⁸ proposed the PointNet++ three-dimensional point cloud deep learning model, whereby the local features of point clouds are utilized to significantly enhance recognition accuracy. Before training the PointNet++ model, the input data are normalized using the farthest point sampling method to avoid overfitting.

Unlike two-dimensional image deep learning models, the PointNet++ model solves two key issues in three-dimensional point cloud deep learning. The first concerns the unordered nature of point clouds and achieving permutation invariance for point clouds with inconsistent orders. The second issue is the large amount of point-cloud data and integration of local features to improve recognition accuracy.

To address the issue of unordered point clouds, PointNet++¹⁸ first uses h function to increase the dimensionality of the input data, then uses g function for max pooling, and finally uses the γ function for MLP operation. Map a set of points to a vector using the following formula:

$$ f\left( {x_{{1}} ,x_{{2}} , \ldots ,x_{n} } \right) = \gamma \left( {g\left( {h\left( {x_{{1}} } \right),h\left( {x_{{2}} } \right), \ldots ,h\left( {x_{n} } \right)} \right)} \right) $$

(1)

here x₁, x₂, …, x_n belong to various points in the point cloud; f(x₁, x₂, …, x_n) represents the PointNet++ function; γ represents the MLP function about g; g represents the Max function about h; and h represents the dimensionality elevation function about x₁, x₂, …, x_n.

To integrate local features, PointNet++ selects local regions based on a specified radius and performs feature extraction for each region. The local features are then combined to obtain the global features, as shown in Fig. 10.

A point-cloud dataset containing different combinations of time sequences was used to train the PointNet++ model. Table 3 presents the parameters used in model training. We randomly divided the dataset into training, validation, and testing sets in a ratio of 6:2:2 for PointNet++ network training.

Table 3 Model training.

Full size table

Results

To address the challenges of sample temporalization and manual selection of filters, samplers, and sample sizes, 80 experiments were designed to cover all possible combinations, as shown in Tables 4 and 5, where A₁ and A₂ denote the filtering algorithms; B₁ and B₂ denote the secondary sampling algorithms; B₃ denotes no secondary sampling; C₁, C₂, C₃, C₄ and C₅ denote the sample sizes; D₁ denotes no temporalization; D₂ denotes temporalization; E₁ denotes male body models; and E₂ denotes female body models. Among them, 20 experiments (a combination of two filtering algorithms, five sample sizes, and two body models without secondary sampling and temporalization) represented the non-temporalized traditional recognition method based on PointNet++. Another 20 experiments (a combination of two filtering algorithms, five sample sizes, one temporalization, and two body models without secondary sampling) introduced temporalization into the traditional method, representing the temporalized traditional recognition method. The remaining 40 experiments (a combination of two filtering algorithms, two secondary sampling algorithms, five sample sizes, one temporalization, and two body models) incorporated point-cloud secondary sampling and temporalization into the traditional method, representing the temporalized secondary sampling recognition method. The effectiveness of sample temporalization and secondary sampling for point-cloud human pose recognition was validated by comparing the results of these 80 experiments. The study also identified the filtering algorithms with the highest classification accuracy among the two choices and explored the optimal combination of filtering methods, secondary sampling methods, and sample sizes for underwater human pose recognition based on temporal 3D point cloud deep learning.

Table 4 Experimental variables.

Full size table

Table 5 Experimental combination.

Full size table

Impact of sample temporalization on model classification accuracy

The effect of sample temporalization on the classification accuracy of traditional recognition models was verified through 20 sets of comparative experiments based on PointNet++. These experiments included 20 (a combination of two filtering algorithms, five sample sizes, and two body models without secondary sampling and temporalization) experiments representing the non-temporalized traditional recognition method. An additional 20 (a combination of two filtering algorithms, five sample sizes, one temporalization, and two body models without secondary sampling) experiments represented the temporalized traditional recognition method. A comparison of the model classification accuracies before and after sample temporalization is shown in Fig. 11. In the 20 sets of comparative experiments, the classification accuracy of all the models with sample temporalization was higher than those without sample temporalization, with the largest difference in accuracy being 0.034331. The average classification accuracy of the models without sample temporalization was 0.924003, whereas that of the models with sample temporalization was 0.952465, resulting in an average accuracy improvement of 0.028462. Thus, sample temporalization improves the accuracy of surface and underwater human pose recognition. This is because human poses are composed of continuous actions, and non-temporalized samples can result in the misclassification of similar actions.

Impact of point cloud secondary sampling on model classification accuracy

The reasonableness of the proposed point-cloud secondary sampling process for recognizing drowning based on 3D point-cloud deep learning was validated through 20 sets of comparative experiments. These experiments included 20 (a combination of two filtering algorithms, five sample sizes, one temporalization, and two body models without secondary sampling) experiments representing the temporalized traditional recognition method and 40 (a combination of two filtering algorithms, two secondary sampling algorithms, five sample sizes, one temporalization, and two body models) experiments representing the temporalized secondary sampling recognition method. A comparison of the model classification accuracy before and after point cloud secondary sampling is shown in Fig. 12. In the 20 sets of comparative experiments, all models incorporating point-cloud secondary sampling had higher classification accuracies than those without point-cloud secondary sampling, with the largest difference in accuracy being 0.016437. The average classification accuracy of models without point cloud secondary sampling was 0.952465, whereas the average classification accuracy of models using SS as the secondary sampling method was 0.964782, and the average classification accuracy of models using OS as the secondary sampling method was 0.958761. The average classification accuracy using SS as the secondary sampling method was 0.012317 higher than that of the models without point-cloud secondary sampling and 0.006022 higher than that of the models using OS. Based on the results, incorporating secondary sampling is beneficial for improving model accuracy. This is because secondary sampling helps address the issue of uneven sample collection, and the use of OS has a better effect in this regard.

Influence of different filtering methods on model classification accuracy

To determine the best filtering method for surface and underwater human-body pose recognition based on 3D point-cloud deep learning, we conducted 20 sets of comparative experiments. These included 40 experiments (a combination of two filtering algorithms, two secondary sampling algorithms, five sample sizes, one temporalization, and two body models), representing the temporalized secondary sampling recognition method, as shown in Fig. 13. In all 40 experiments, the classification accuracy exceeded 0.940000. The highest classification accuracy of 0.975012 was achieved using SOR. In the 20 sets of comparative experiments, the classification accuracy of all the SOR experiments was higher than that of the ROR experiments. From an overall perspective, SOR performs better than ROR because it maintains the statistical distribution of the point cloud. By contrast, ROR can only eliminate visually noticeable independent noise.

Impact of different method combinations on model classification accuracy

To explore the optimal combination of preprocessing methods, point cloud secondary sampling methods, and the impact of sample point numbers on surface and underwater human pose recognition based on three-dimensional point cloud deep learning, we conducted 10 experiments based on SOR-SS (a combination of one filtering algorithm, one secondary sampling algorithm, five sample sizes, one temporalization, and two body models). A comparison of the model classification accuracies for different combinations of the methods is shown in Fig. 14. The figure shows that the highest model classification accuracy was achieved with SOR, SS, and 3072 sample points. The classification accuracies of the male and female body models are 0.975012 and 0.973571, respectively. The lowest model classification accuracy was obtained with SOR, OS, and 512 sample points, with classification accuracies of 0.961652 and 0.961548 for the male and female body models, respectively. Therefore, it can be inferred that using SOR, SS, and a sample size of 3072 can improve model accuracy. This is because the sample size determines the amount of information contained. However, because of the suboptimal nature of manually collected point clouds, excessively large sample sizes can introduce irrelevant information, thereby reducing model accuracy.

Comparison of classification accuracy with other point cloud recognition networks

Our dataset was trained in the same experimental environment, using several popular point-cloud recognition networks. A computer with an i9-11900K @ 3.50 GHz processor and an RTX 3060 graphics card was used. The experimental results are listed in Table 6. The PointNet++ network used in this study has higher classification accuracy than the other networks.

Table 6 Comparison of classification accuracy.

Full size table

Conclusion

The results show that the proposed LiDAR method can be effectively employed for surface and underwater human pose recognition using temporal 3D point cloud deep learning.

(1)
The impact of sample temporalization on model classification accuracy was investigated, and the average accuracy increased by 0.028462 after sample temporalization. This is advantageous in improving the recognition ability of surface and underwater human-body pose recognition models.
(2)
The necessity of incorporating point-cloud secondary sampling was discussed. The classification accuracy of the surface and underwater human body pose recognition models significantly improved after adding point-cloud secondary sampling. Among them, SS had the best effect, with an average increase of 0.012317 in accuracy.
(3)
The optimal point-cloud filtering algorithm for surface and underwater human pose recognition based on temporal 3D point-cloud deep learning was determined to be SOR. The experimental results showed that the classification accuracy using SOR was significantly higher than that using ROR, with an average precision increase of 0.009873.
(4)
The best combination of methods for surface and underwater human pose recognition was obtained based on temporal 3D point-cloud deep learning. When using SOR, the SS algorithm, and a sample input point size of 3072, the model achieved a significantly higher classification accuracy compared to the other combinations, achieving the highest classification accuracies for the male and female human models of 0.975012 and 0.973571, respectively.

Although this study demonstrated the potential of surface and underwater human pose recognition based on temporal 3D point-cloud deep learning, some limitations still need to be addressed. The experimental stage should be validated for practical applications. It is necessary to collect point-cloud data on the water surface and underwater human attitudes from real humans to train models and obtain more reliable recognition models. Furthermore, in practical applications, the swimming speeds of different individuals should be considered in selecting different point-cloud acquisition times for recognition.

Data availability

The dataset generated during the current study are not publicly available due production took a lot of time and money but are available from the corresponding author on reasonable request.

References

Li, S. et al. Current situation and trend of drowning death in China. Mod. Prev. Med. 48(15), 2705–2709 (2021).
Google Scholar
Zou, X. et al. Research on intelligent swimming cap for preventing drowning in swimming pools based on ZigBee communication module. Technol. Wind 21(05), 51–56 (2018).
Google Scholar
Qiu, R. et al. A smart anti-smashing vest based on single chip computer. Electric Tool 35(03), 7–10 (2019).
Google Scholar
Dadashi, F. et al. Front-crawl instantaneous velocity estimation using a wearable inertial measurement unit. Sensors 12(10), 12927–12939 (2012).
Article ADS PubMed PubMed Central Google Scholar
Parvis, M. et al. Swimming symmetry assessment via multiple inertial measurements. IEEE Int. Symp. Med. Meas. Appl. 42(21), 208–224 (2016).
Google Scholar
Dehbashi, F., Ahmed, N., Mehra, M., et al. SwimTrack: Drowning detection using RFID. In Proceedings of the ACM SIGCOMM 2019 Conference Posters and Demos, Vol. 38, No. 06, 161–162 (2019).
Zhu, L. Research on Underwater Human Body Detection Technology Based on Improved Background Difference Method (Beijing University of Technology, 2017).
Google Scholar
Salehi, N., Keyvanara, M. & Monadjemmi, S. A. An automatic video-based drowning detection system for swimming pools using active contours. Int. J. Image Graph. Signal Process. 8(8), 1–8 (2016).
Article Google Scholar
Prakash, B. D. Near-drowning early prediction technique using novel equations (NEPTUNE) for swimming pools. Comput. Sci. Inf. Technol. (CS & IT) 8(18), 52–68 (2018).
Google Scholar
Shiuuee, K. & Rezaei, F. A presentation of drowning detection system on coastal lines using image processing techniques and neural network. J. Inj. Violence Res. 11(3), 259–268 (2019).
Google Scholar
He, X., Yuan, F. & Zhu, Y. Drowning detection based on video anomaly detection. In International Conference on Image and Graphics 700–711 (Springer, 2021).
Dulhare, U. N. & Ali, M. H. Underwater human detection using Faster R-CNN with data augmentation. Mater. Today Proc. 12(25), 155–169 (2021).
Google Scholar
Li, W., Zhang, Z. & Liu, Z. Action recognition based on a bag of 3d points. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 9–14 (2010).
Singh, A. D., Sandha, S. S., Garcia, L., et al. RadHar: Human activity recognition from point clouds generated through a millimeter-wave radar. In Proceedings of the 3rd ACM Workshop on Millimeter-Wave Networks and Sensing Systems 51–56 (ACM Press, 2019).
Huang, Y. et al. Activity recognition based on millimeter-wave radar by fusing point cloud and Range–Doppler in-formation. Signals 3(2), 266–283 (2022).
Article Google Scholar
Jin, T. et al. UWB-HA4D-1.0: An ultra-wideband radar human activity 4D imaging dataset. J. Radar Sci. 11(1), 27–39 (2022).
Google Scholar
Charles R. Qi, Hao Su, Kaichun Mo, et al. PointNet: Deep learning on point sets for 3D classification and segmentation 77–85. arXiv preprint http://arxiv.org/abs/1612.00593 (2016).
Qi, C. R., Yi, L., Su, H. et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint http://arxiv.org/abs/arxiv:1706.02413 (2017).
Tong, L. & Li, J. A three-dimensional hand posture estimation method based on improved PointNet++network. J. Graph. 43(05), 892–900 (2022).
Google Scholar
Ge, L., Cai, Y., Weng, J. et al. Hand PointNet: 3D hand pose estimation using point sets. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 8417–8426. (EEE Press, 2018).
Wang, X. et al. Point cloud noise processing in path planning of autonomous mobile robot. CAAI Trans. Intell. Syst. 16(4), 699–706 (2021).
Article Google Scholar
Dongdong, Lu. & Zou, J. Comparative research on denoising algorithms of 3D laser point cloud. Bull. Surv. Mapp. 2, 102–105 (2019).
Google Scholar
Li, P. et al. Noise classification denoising algorithm for point cloud model. Comput. Eng. Appl. 52(20), 188–192 (2016).
Google Scholar
Zhang, J., Jing, Xu. & Luo, Z. Hierarchical matching of 3D shape based on heat kernel signature. J. Comput. Aided Des. Comput. Graph. 26(12), 2142–2148 (2014).
Google Scholar
Jiang, W. & Liu, G. Bayesian finite element model modification method based on multi chain differential evolution. Eng. Mech. 36(06), 101–108 (2019).
Google Scholar
Zhang, M. et al. 3D voxel model retrieval based on octree structure. Chin. J. Comput. 44(02), 334–346 (2021).
Google Scholar
Lu, F. et al. Sign language recognition based on lightweight 3D CNNs and transformer. J. Huazhong Univ. Sci. Technol. Nat. Sci. Ed. 51(5), 13–18 (2023).
Google Scholar
Zhao, Y. et al. 3D point cloud object detection method in view of voxel based on graph convolution network. Infrared Laser Eng. 50(10), 281–289 (2021).
Google Scholar
Liu, J. & Zhu, Z. Pipeline leakage detection algorithm based on sparse and lightweight convolutional neural network. Electron. Meas. Technol. 45(19), 131–135 (2022).
Google Scholar

Download references

Acknowledgements

This study was financially supported by the National Natural Science Foundation of China (Grant No. 52204130), the Natural Science Foundation of Guangxi Province (Grant No. 2022GXNSFBA035599) and the Innovation Project of GUET Graduate Education (Grant No. 2022YCXS010).

Author information

Authors and Affiliations

School of Mechanical and Electrical Engineering, Guilin University of Electronic Technology, Guilin, 541004, Guangxi, China
Haijian Wang & Zhenyu Wu
School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin, 541004, Guangxi, China
Xuemei Zhao

Authors

Haijian Wang
View author publications
Search author on:PubMed Google Scholar
Zhenyu Wu
View author publications
Search author on:PubMed Google Scholar
Xuemei Zhao
View author publications
Search author on:PubMed Google Scholar

Contributions

H.W. conceived the experiments and wrote the manuscript, Z.W. and X.Z. conducted the experiments and analyzed the results. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xuemei Zhao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, H., Wu, Z. & Zhao, X. Surface and underwater human pose recognition based on temporal 3D point cloud deep learning. Sci Rep 14, 55 (2024). https://doi.org/10.1038/s41598-023-50658-4

Download citation

Received: 25 September 2023
Accepted: 22 December 2023
Published: 02 January 2024
Version of record: 02 January 2024
DOI: https://doi.org/10.1038/s41598-023-50658-4

This article is cited by

A novel deep learning approach to classify 3D foot types of diabetic patients
- Pui-ling Li
- Qin-feng Xiao
- Li-ying Zhang
Scientific Reports (2025)

Subjects

Abstract

Similar content being viewed by others

An advanced three stage lightweight model for underwater human detection

Multi-modal sensor fusion towards three-dimensional airborne sonar imaging in hydrodynamic conditions

Adaptive fusion based deep learning framework for restoring underwater image quality using multi scale attention features

Introduction

Experimental platform setup

Experimental equipment

Experimental objects

Research proposal

Point cloud acquisition

Point cloud filtering

Point cloud secondary sampling

Sample representation

Model training

Results

Impact of sample temporalization on model classification accuracy

Impact of point cloud secondary sampling on model classification accuracy

Influence of different filtering methods on model classification accuracy

Impact of different method combinations on model classification accuracy

Comparison of classification accuracy with other point cloud recognition networks

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A novel deep learning approach to classify 3D foot types of diabetic patients

Search

Quick links