Toward automatic and reliable evaluation of human gastric motility using magnetically controlled capsule endoscope and deep learning

Li, Xueshen; Gan, Yu; Duan, David; Yang, Xiao

doi:10.1038/s41598-025-10839-9

Download PDF

Article
Open access
Published: 17 July 2025

Toward automatic and reliable evaluation of human gastric motility using magnetically controlled capsule endoscope and deep learning

Xueshen Li^1,2,3,
Yu Gan^1,2,
David Duan³ &
…
Xiao Yang³

Scientific Reports volume 15, Article number: 25955 (2025) Cite this article

950 Accesses
Metrics details

Subjects

Abstract

In this paper, we develop a combination of algorithms, including camera motion detector (CMD), deep learning models, class activation mapping (CAM), and periodical feature detector for the purpose of evaluating human gastric motility by detecting the presence of gastric peristalsis and measuring the period of gastric peristalsis. Moreover, we use visual interpretations provided by CAM to improve the sensitivity of the detection results. We evaluate the performance of detecting peristalsis and measuring period by calculating accuracy, F1, and area under curve (AUC) scores. Also, we evaluate the performance of the periodical feature detector using the error rate. We perform extensive experiments on the magnetically controlled capsule endoscope (MCCE) dataset with more than 100,000 frames (100,055 specifically). We have achieved high accuracy (0.8882), F1 (0.8192), and AUC scores (0.9400) for detecting human gastric peristalsis, and low error rate (8.36%) in measuring peristalsis periods from the clinical dataset. The proposed combination of algorithms has demonstrated the feasibility of assisting in the evaluation of human gastric motility.

Artificial intelligence enabled automated diagnosis and grading of ulcerative colitis endoscopy images

Article Open access 17 February 2022

A unified framework harnessing multi-scale feature ensemble and attention mechanism for gastric polyp and protrusion identification in gastroscope imaging

Article Open access 17 February 2025

How machine learning on real world clinical data improves adverse event recording for endoscopy

Article Open access 10 July 2025

Introduction

In the normal digestion process, food is propelled through the digestive tract by rhythmic muscle contractions known as peristalsis. The resulting movement of contents is referred to as gastrointestinal motility (throughout the digestive tract) or gastric motility (when limited to the stomach)¹. Gastric motility disorder occurs when the normal peristalsis is disrupted, which may cause severe constipation, recurrent vomiting, bloating, diarrhea, nausea, and even death. Gastric motility may be assessed through various techniques. While direct measurements such as gastric emptying scintigraphy and wireless motility capsules (WMCs) quantify food movement, other modalities-including manometry, electrogastrography (EGG), and ultrasound-evaluate the factors affecting gastric motility. However, traditional methods of evaluating gastric motility have their limitations. Manometry involves intranasal intubation protocols, which may cause discomfort to patients and lead to the use of sedation². Nuclear medicine is required for gastric emptying scintigraphy, which leads to the risk of radiation exposure to patients³. EGG has many variations in the recording system. Despite recent efforts to standardize EGG for body surface gastric mapping⁴, EGG remains susceptible to inter-individual physiological variability, such as differences in body mass index. Gastric ultrasound suffers from the trade-off between penetration depth and resolution⁵. Wireless motility capsules measure physiological parameters related to gastric peristalsis, such as temperature, pH, and pressure, as they travel through the gastrointestinal tract⁶. However, as the WMCs passively transit through the gastrointestinal tract, they may fail to accurately assess motility at specific anatomical landmarks⁷. On the other hand, magnetically controlled capsule endoscope (MCCE) is an emerging tool for the diagnosis of gastric diseases, which provides real-time, true-color visualization of the gastric environment. With active magnetic control, MCCE enables precise localization and visualization of anatomical landmarks. Moreover, the MCCE provides direct, multi-angle visualization of contraction waves, which enables comprehensive and effective analysis of gastric motility. Besides, the MCCE possesses advantages of comfort, safety, and no anesthesia⁸.

However, an extensive amount of labor from the clinical participants is needed to evaluate gastric motility using MCCE. For example, each MCCE frame needs to be inspected for the presence of peristalsis; and the period of peristalsis needs to be manually counted. Thus, there is a need to develop automatic algorithms for evaluating gastric motility using MCCE systems. Deep learning algorithms have been used in the field of medical imaging^9,10,11 as well as assisting the diagnosis of MCCE systems^12,13. Convolution neural network (CNN) has been used in detecting polyps^14,15, ulcer¹⁶, tumor, and mucosa¹². Moreover, the Deep Reinforcement Learning (DRL) approaches have been used in the automated navigation of the MCCE capsules within human stomach. However, the existing research focuses on detecting gastric lesions, anomaly detection, segmentation, and navigation¹⁷ based on single MCCE frame, instead of utilizing the temporal information from MCCE frame sequences.

In this paper, we develop a combination of algorithms for evaluating human gastric motility by detecting and measuring gastric peristalsis using the MCCE system. During the MCCE examination, an external magnetic head will guide the capsule to move and capture images within the human stomach, which poses a challenge for action recognition algorithms¹⁸. To mitigate the sudden movement of MCCE capsule, we develop a camera motion detector (CMD) for processing MCCE frame sequences. We develop a framework for detecting gastric peristalsis, which is compatible with CNN + long short-term memory (LSTM)^19,20 and transformer-base models²¹. The human gastric contraction waves present features in both spatial and temporal domains. In the spatial domain, the waves have morphological shapes; in the temporal domain, the shape of waves changes over time. The CNN model is capable of capturing the spatial features and LSTM model is capable of capturing the temporal features; and the Video Swin Transformer²¹ is capable of analyzing patches across the spatial and temporal dimensions . For detection and classification algorithms in most medical applications, reducing false negatives is actually more important than reducing false positives²². False negative results may lead to an omission in detecting gastric peristalsis, which will lead to an underestimation of human gastric motility. The class activation mapping (CAM)²³ is capable of highlighting the regions within the gastric where the peristalsis occurs. To make the detection results more reliable (i.e. have fewer false negatives in detecting peristalsis frames in the dataset), we improve the detection sensitivity using the visual interpretations provided by CAM. Moreover, we develop a periodical feature detector for measuring the period of human gastric peristalsis based on the analysis of feature map of MCCE frames.

We conducted extensive experiments on our MCCE dataset, which includes over 100,000 frames (specifically 100,055) from 30 subjects for the training and validation sets, and 24,183 frames from 11 subjects for the testing set. Our combination of algorithms is capable of evaluating gastric motility by detecting the presence of peristalsis as well as measuring the period of gastric peristalsis. The proposed algorithms have great potential to be developed in clinical devices for assisting the evaluation of gastric motility.

Methods

MCCE dataset

The MCCE dataset was acquired by the department of research and development at AnX Robotica. Using the NaviCam MCCE system, inspection videos of internal volunteers were collected. The MCCE system consists of four components: a swallowable, wireless, and magnetically controlled capsule endoscope (11.8 $\times$ 27 mm), a guidance magnetic robot, a data recorder, and a computer workstation with corresponding softwares. An example of the components of the NaviCam MCCE system is shown in Fig. 1. The videos captured by MCCE were recorded at 2 fps, with a size of 480 $\times$ 480 pixels. The MCCE videos were treated as frame sequences. Our training and validation set contains more than 100,000 MCCE (specifically 100,055) frames from 30 subjects.

Design of camera motion detector

We design the CMD for filtering the MCCE frames which are deteriorated by camera movement. The proposed CMD takes two consecutive frames, Frame N-1 and Frame N as inputs. Then the CMD determines the camera motion by analyzing histograms. The details of the CMD is described in Algorithm 1. A normalized Gaussian function with $\mu$ at 128 and $\sigma$ at 20 is adopted as the mask M. The choice of $\mu$ and $\sigma$ is based on empirical research. With a higher threshold T, the mean sequence length of resulting video sequences is longer, leading to higher computational cost. However, the resulting video sequences will contain less motion with lower threshold T. We empirically set the threshold T to 200 to strike a trade-off between the sequence length and camera motion within sequences.

Workflow of detecting human gastric peristalsis

The workflow of detecting human gastric peristalsis is shown in Fig. 2a. For training, the MCCE dataset will be processed by the CMD, which provides stable MCCE frames. For prediction (testing), the testing MCCE data will go through a post-processing step. In the post-processing step, the CMD is involved in determining the quality of the MCCE frames. The frames with camera movement above the threshold will be marked as ’camera moving’; the stable MCCE frames which pass the CMD will be sent to the pre-trained deep-learning model for prediction, which generates outputs of ’wave’ or ’nowave’. The proposed framework is compatible with various deep-learning models. In Fig. 2b, we demonstrate the method of using CAM for improving the sensitivity of the framework. In Fig. 2c, we demonstrate the ensemble of the CNN and LSTM for detecting human gastric peristalsis.

Improving sensitivity using CAM

We use the CAM to calibrate the detection results of the deep learning model. The CAM for a particular category indicates the discriminative image regions used by the CNN to identify that category. We calculate the CAM for each MCCE frame k. Then we use a threshold $\hbox {T}_{c}$ to filter the activated pixels in the CAM. Then we count the number of activated pixels. If the number of activated pixels of a frame is larger than the threshold $\hbox {T}_{c}$, the frame will be classified as ‘wave’ in the modified label list c. Then we perform a calibration between the modified list c and the original prediction list p. If $c_k$ or $p_k$ is ‘wave’, the final calibrated prediction results in $pr_k$ will be ‘wave’. The algorithm is described in Algorithm 2. The parameter $\hbox {T}_{p}$ determines the threshold of choosing positive CAM pixels and $\hbox {T}_{c}$ determines the number of CAM positive pixels to consider a frame as positive. The higher of these two parameters, the stricter of the CAM filter. For example, if the Tp is set to 1 and Tc is set to 230,400 (total number of the pixels in a MCCE frame with a size of 480$\times$480), then no frame can pass the CAM filter and the sensitivity of the CNN+LSTM model will not be improved. If both the $\hbox {T}_{p}$ and $\hbox {T}_{c}$ are set to 0, then every MCCE frame will be able to pass the CAM filter, and the sensitivity will be 1. Following some of the existing research^24,25, we set $\hbox {T}_{p}$ = 0.8. The choice of $\hbox {T}_{c}$ is set to 400 based on empirical research.

Periodical feature detector for human gastric peristalsis

We design a periodical feature detector for measuring periods of human gastric peristalsis. The inputs of the periodical feature detector are a range of intervals, MCCE frames, and two thresholds $T_{l}$ and $T_{r}$. The periodical feature detector calculates the feature difference score S of feature maps across certain intervals i. For each interval i, a score ${\textbf {S}}_{i}^{mean}$ is calculated for the MCCE frames. The period of the human gastric peristalsis is determined by the local minimal P between thresholds $T_{l}$ and $T_{r}$. The details of the periodical detector are described in Algorithm 3. The intervals i were set from 5s to 50s, with an incremental of 0.5 s (2 fps). The $T_l$ is set to 10 s and $T_u$ is set to 40 s. The choice of $T_l$ and $T_u$ is determined by the minimum frames needed to detect human gastric peristalsis and the average period of normal human gastric peristalsis. We will show experimental details in results section to confirm that 10 s (20 MCCE frames) achieves optimal performance in detecting gastric peristalsis. Thus, we assume 10 s to be the lower bound for the periodical feature detector and set $T_l$ to be 10 s. We set the $T_u$ (40 s) to be twice the value of the average period of normal gastric peristalsis, which is around 20 s²⁶. Note that the periodical feature detector will detect both the period and multiples of the period. Thus, setting $T_u$ = 40 s can remove the multiples of the period of the normal human gastric peristalsis. We use the EfficientNet_b7²⁷ to generate the feature maps of the MCCE frames.

Evaluation setup

Using the CMD, we acquire 32,431 stable MCCE frames (wave: 9501, nowave: 22,930) from the training set. The 32,431 stable MCCE frames are divided into 1028 MCCE frame sequences (wave: 336, nowave: 692), each sequence consists of more than 20 frames. Each MCCE frame sequence corresponds to a single label of whether ‘wave’ or ‘nowave’. For training and cross-fold validation, we used the 1028 MCCE stable frame sequences. For testing, we used 30 additional MCCE records from another 11 individuals (24,183 frames), which are acquired after the training data. The training cohort and testing cohort were divided according to the time of data acquisition. The training and testing data were acquired from the same center, using the same MCCE system.

Network training parameters

For training the CNN-LSTM and Video Swin Transformer models, we set the batch size to eight (frame sequences). We used pre-trained weights on ImageNet²⁸ for all the models in this project. For the CNN+LSTM models, we acquired the pretrained weights from the Torchvision²⁹ package; for the Video Swin Transformer model, we acquired the pretrained weights from the mmaction³⁰ package. We trained all the models for 200 epochs, in which we observed a plateau of loss function for all models. We used the first five epochs for warm-up, in which we only train the CNN model. The learning rate is initialized to $10^{-4}$, followed by half decay for every 10 epochs. The experiments were carried out in a single RTX 3080 GPU.

Evaluation metrics

We used accuracy, F1 score, and area under the curve (AUC) to evaluate the classification performance. The accuracy score evaluates the performance of the model to correctly predict (true positive and true negative) the wave/nonwave video frames; the F1 score evaluates the performance of the model, with the consideration of falsely predicted cases (false negative and false positive). In this medical-related study, we need to evaluate model performance for both correctly and falsely predicted cases. The AUC score evaluate how much the trained model can distinguish between the wave and non-wave video sequences.

$$\begin{aligned} Accuracy = \frac{TP+TN}{TP+TN+FP+FN}, \end{aligned}$$

(1)

where TP stands for true positive; TN stands for true negative; FP stands for false positive; FN stands for false negative.

$$\begin{aligned} F1 = 2 \times \frac{Precision \times Sensitivity}{Precision+Sensitivity}, \end{aligned}$$

(2)

where precision is defined by $\frac{TP}{TP+FP}$; sensitivity is defined by $\frac{TP}{TP+FN}$.

The AUC score tells how much the classification model is capable of distinguishing between classes, which can be calculated by the receiver operating characteristic (ROC) curve. The AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0; one whose predictions are 100% correct has an AUC of 1.

We define the error rate to quantify the performance of the periodical feature detector.

$$\begin{aligned} \text {Error rate} = \frac{|\text {Detected period} - \text {Counted period}|}{\text {Counted period}} \times 100\%. \end{aligned}$$

(3)

Ethics declarations

The MCCE data adopted in this research was collected from internal healthy volunteers in Ankon, using the NaviCam system. The NaviCam system is registered medical device in National Medical Products Administration (NMPA).

The data collection protocol was designed and performed by the Medical Department in Ankon, which is in accordance with the Ankon Internal Volunteer Protocol and Declaration of Helsinki.

Informed consent for the data collection protocol and reuse of the data for research purpose were acquired from all the internal healthy volunteers. The internal healthy volunteers received monetary compensation.

Results

Effects of CMD

We show the effect of using the CMD on stable and unstable MCCE frame sequences. An example of applying CMD to stable (upper row) and unstable (lower row) MCCE images is demonstrated in Fig. 3. On the upper row, we apply the CMD to two consecutive MCCE frames captured when the capsule is stable. In this case, the main body of the histogram ${\textbf {H}}$ of the residual images between the two frames is zero. However, the high band (right) and low band (left) in the histogram have high values. This is caused by the small motion of the capsule, which is equipped with a light source. The slight changes in the light source will change the positions of bright/dark regions, which will be captured by the high/low bands of the histogram of the residual image. Using the mask ${\textbf {M}}$, we can filter out the high/low bands of ${\textbf {H}}$, which results in ${\textbf {H}}_M$. On the bottom row, we apply the CMD to two consecutive MCCE frames captured when the capsule is unstable, which leads to a high CMD score S (2616).

Detecting human gastric peristalsis

Table 1 Performance of the deep learning model using different memory lengths³¹.

Full size table

We train the CNN+LSTM models using different memory lengths (1, 5, 10, and 20 video frames). The model performance of different memory lengths is reported in Table 1. We observe the model performance increases with the value of the memory length. The CNN+LSTM model with a memory length of 20 shows the best performance.

Table 2 Performance comparison of different deep learning models.

Full size table

We implemented different types of CNN models with LSTM of memory length of 20, including ResNet18, ResNet50, ResNet101³², ShuffleNet_v2³³, EfficientNet_b0, and EfficientNet_b7²⁷. Also, we implemented video swin transformer²¹ for detecting the human gastric peristalsis. The results are shown in Table 2. The CNN+LSTM with memory length of 20 shows the best performance. The video swin transformer model shows worse performance compared to the CNN+LSTM models. Although the video swin transformer has the potential to demonstrate superior performance in classifying natural images with multiple classes²¹, CNN+LSTM models perform better for detecting gastric peristalsis in our dataset. The EfficientNet_b7 has a better performance compared to other CNN models²⁷. The multi-objective neural architecture search algorithm in EfficientNet_b7, which searches for optimal model design, may lead to superior performance over other CNN models.

Improving the detection sensitivity using CAM

In Fig. 4, we demonstrate the still images from the inference results of four representative cases in the testing set. We follow the inference protocol in Fig. 2a. The inference results are shown in black bold fonts. Also, we calculate CAM for the MCCE frames during inferencing. The CAM is projected to a heatmap, where red corresponds to high intensity (1) and blue corresponds to low intensity (0). In inference videos, the human gastric peristalsis is highlighted by the red regions in the CAM. The CAM provides visual explanations of the CNN+LSTM model for detecting gastric peristalsis.

By analyzing the activated regions in CAM (described in Algorithm 2) and calibrating the original predictions results, we improve the sensitivity of the CNN+LSTM model in detecting gastric peristalsis. The results are reported in Fig. 5. With the explainable information provided by CAM, we are capable of reducing the false negative results in detecting gastric peristalsis, compared to the vanilla CNN+LSTM model.

Measuring period of human gastric peristalsis

To capture the period of human gastric peristalsis, we develop the periodical feature detector in Algorithm 3. The proposed detector extracts the periodical information by analyzing the difference of MCCE feature maps given different intervals. An example of applying the periodical feature detector on MCCE frame sequence is shown in Fig. 6. In this case (case2 in the testing set), the detected period is 17.5 s (an error rate fo 8.85% compared to the counted period 19.2 seconds).

We apply the periodical detector to the testing set of 30 MCCE frame sequences. The counted and detected periods are reported in Fig. 7. The proposed periodical feature detector achieves a mean error rate of 8.36% with a standard deviation of 12.84%.

Discussion

In this paper, we explored the deep learning and image processing algorithms for detecting and measuring periods of human gastric peristalsis. We developed a generic framework for detecting human gastric peristalsis using deep learning. We explore multiple CNN+LSTM models and video swin transformer for detecting gastric peristalsis using the proposed framework. Also, we developed a CMD for filtering the MCCE frames which are deteriorated by camera movement. The current design of CMD is based on processing MCCE frames. In the future, we may add additional information and optimize the design of CMD to various devices, such as magnetic positioning data from the NaviCam. On our MCCE dataset with more than 100,000 MCCE frames (100,055 specifically) from 30 subjects, we achieved 0.8882 accuracy, 0.8192 F1, and 0.9400 AUC scores for detecting gastric peristalsis. In the future, we will train and test the proposed algorithms on different MCCE systems from different medical centers to evaluate the algorithm’s generalization ability. Moreover, we improved the sensitivity of detecting gastric peristalsis using visual interpretation provided by the CAM. To measure the period of the gastric peristalsis, we designed a periodical feature detector. The proposed periodical feature detector achieves a mean error rate of 8.36% in our dataset, which outperforms the existing method in our previous research³¹. We notice in the case26, the periodical feature detector has the highest error rate (67.68%). We investigate the case26 in Fig. 8. In case26, the MCCE frame sequences capture a substantial amount of mucus. The mucus has shape features and motions different than the gastric peristalsis, which may lead to the deteriorated performance of the periodical feature detector. To mitigate the performance drop of the periodical feature detector caused by the presence of mucus, image noise removal algorithms may be adopted. Also, the performance of peristalsis detection can be improved by involving more diverse training data, such as datasets with more presence of debris and mucus. With a more diversed dataset, the deep learning model can learn to ignore the noises (e.g. debris and mucus) and focus on detecting the gastric peristalsis. Moreover, image-denoising algorithms can be adapted to preprocess the data to reduce the level of noise.

The proposed algorithms have great potential to be integrated into clinical workflows. For example, the algorithms can be integrated into data recorders and computer workstations of MCCE systems (shown in Fig. 1c). During the clinical diagnosis, the collected data can be analyzed and provide real-time gastric motility evaluation. Moreover, the proposed algorithms can run off-line and analyze the collected MCCE data from previous clinical diagnoses, retrospectively. However, more diverse training data is needed before adopting the proposed algorithms in real clinical scenarios. Especially, the training data should cover a wide range of stomach environments, e.g. different genders, age ranges, ethnic groups, and previous medical conditions.

Conclusion

As an exploratory study on automatic detection and measurement of human gastric peristalsis, the algorithms developed in this research have great potential to help both clinicians and patients. Using the proposed algorithms, the extensive manual labor in evaluating gastric peristalsis, such as inspection of each MCCE frame and counting the period of peristalsis, can be reduced for clinicians; and patients can benefit from reliable examination results. The proposed algorithms contribute to an efficient and reliable workflow that we envision for MCCE systems. However, the algorithms, especially the periodical feature detector, are developed based on a clean gastric environment without the presence of debris and mucus. Although we improved the sensitivity of the CNN+LSTM model using the explainable visual interpretations provided by CAM, the model performance may deteriorate with the presence of gastric debris and mucus. In the future, we will improve the algorithm design focusing on the presence of debris and mucus by acquiring more data. With a more diverse dataset, the deep learning model can learn to be robust to the noises present in the image, including the debris and mucus. Also, we may add image de-nosing algorithms to pre-process the MCCE frames. We optimized the parameters ($\mu$, $\sigma$, and T in Algorithm 1, $\hbox {T}_p$ and $\hbox {T}_c$ in Algorithm 2) for the NaviCam MCCE system. We will keep optimizing the parameters using more data and for MCCE systems from other manufacturers. Moreover, our current dataset is collected based on healthy volunteers. We will extend our dataset and involve patients with gastric disease. We aim to further improve the robustness of the proposed algorithm with the extended dataset. Besides, we will enable the proposed algorithm to detect and classify human gastric diseases based on peristalsis.

Data availability

The datasets generated during the current study are not publicly available due to the policy of AnX Robotica but are available from the corresponding author upon reasonable request.

References

Szarka, L. A. & Camilleri, M. Methods for measurement of gastric motility. Am. J. Physiol. Gastrointest. Liver Physiol. 296, G461–G475. https://doi.org/10.1152/ajpgi.90467.2008 (2009).
Article CAS PubMed Google Scholar
Christian, K. E., Morris, J. D. & Xie, G. Endoscopy- and monitored anesthesia care-assisted high-resolution impedance manometry improves clinical management. Case Rep. Gastrointest. Med. 2018, 9720243. https://doi.org/10.1155/2018/9720243 (2018).
Article PubMed PubMed Central Google Scholar
Kar, P., Jones, K. L., Horowitz, M., Chapman, M. J. & Deane, A. M. Measurement of gastric emptying in the critically ill. Clin. Nutr. 34, 557–564. https://doi.org/10.1016/j.clnu.2014.11.003 (2015).
Article PubMed Google Scholar
O’Grady, G. et al. Principles and clinical methods of body surface gastric mapping: Technical review. Neurogastroenterol. Motil. 35, e14556. https://doi.org/10.1111/nmo.14556 (2023).
Article PubMed PubMed Central Google Scholar
Lento, P. H. & Primack, S. Advances and utility of diagnostic ultrasound in musculoskeletal medicine. Curr. Rev. Musculoskelet. Med. 1, 24–31. https://doi.org/10.1007/s12178-007-9002-3 (2008).
Article PubMed Google Scholar
Saad, R. J. & Hasler, W. L. A technical review and clinical assessment of the wireless motility capsule. Gastroenterol. Hepatol. 7, 795–804 (2011).
Google Scholar
Thwaites, P. A. et al. Comparison of gastrointestinal landmarks using the gas-sensing capsule and wireless motility capsule. Aliment. Pharmacol. Therap. 56, 1337–1348. https://doi.org/10.1111/apt.17216 (2022).
Article Google Scholar
Zhang, Y., Zhang, Y. & Huang, X. Development and application of magnetically controlled capsule endoscopy in detecting gastric lesions. Gastroenterol. Res. Pract. 2021, 2716559. https://doi.org/10.1155/2021/2716559 (2021).
Article PubMed PubMed Central Google Scholar
Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248. https://doi.org/10.1146/annurev-bioeng-071516-044442 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wu, J., Ye, X., Mou, C. & Dai, W. Fineehr: Refine clinical note representations to improve mortality prediction. In 2023 11th International Symposium on Digital Forensics and Security (ISDFS) 1–6. https://doi.org/10.1109/ISDFS58141.2023.10131726 (2023).
Ye, X., Wu, J., Mou, C. & Dai, W. Medlens: Improve mortality prediction via medical signs selecting and regression. In 2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI) 169–175. https://doi.org/10.1109/CCAI57533.2023.10201302 (2023).
Xia, J. et al. Use of artificial intelligence for detection of gastric lesions by magnetically controlled capsule endoscopy. Gastrointest. Endosc. 93, 133-139.e4. https://doi.org/10.1016/j.gie.2020.05.027 (2021).
Article PubMed Google Scholar
Qin, K. et al. Convolution neural network for the diagnosis of wireless capsule endoscopy: A systematic review and meta-analysis. Surg. Endosc. 36, 16–31. https://doi.org/10.1007/s00464-021-08689-3 (2022).
Article PubMed Google Scholar
Nadimi, E. S. et al. Application of deep learning for autonomous detection and localization of colorectal polyps in wireless colon capsule endoscopy. Comput. Electric. Eng. 81, 106531. https://doi.org/10.1016/j.compeleceng.2019.106531 (2020).
Article Google Scholar
LaLonde, R., Kandel, P., Spampinato, C., Wallace, M. B. & Bagci, U. Diagnosing colorectal polyps in the wild with capsule networks. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) 1086–1090. https://doi.org/10.1109/ISBI45749.2020.9098411 (2020).
Alaskar, H., Hussain, A., Al-Aseem, N., Liatsis, P. & Al-Jumeily, D. Application of convolutional neural networks for automated ulcer detection in wireless capsule endoscopy images. Sensors 19, 65. https://doi.org/10.3390/s19061265 (2019).
Article Google Scholar
Muruganantham, P. & Balakrishnan, S. M. A survey on deep learning models for wireless capsule endoscopy image analysis. Int. J. Cogn. Comput. Eng. 2, 83–92. https://doi.org/10.1016/j.ijcce.2021.04.002 (2021).
Article Google Scholar
Wu, S., Oreifej, O. & Shah, M. Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In 2011 International Conference on Computer Vision 1419–1426. https://doi.org/10.1109/ICCV.2011.6126397 (2011).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 (1997).
Article CAS PubMed Google Scholar
van Houdt, G., Mosquera, C. & Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 53, 5929–5955. https://doi.org/10.1007/s10462-020-09838-1 (2020).
Article Google Scholar
Liu, Z. et al. Video swin transformer. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 3192–3201. https://doi.org/10.1109/CVPR52688.2022.00320 (2022).
Hashemi, S. R. et al. Asymmetric loss functions and deep densely-connected networks for highly-imbalanced medical image segmentation: Application to multiple sclerosis lesion detection. IEEE Access 7, 1721–1735. https://doi.org/10.1109/ACCESS.2018.2886371 (2019).
Article Google Scholar
Zhou, B., Khosla, A., Lapedriza, À., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2921–2929 (2015).
Sun, K., Shi, H., Zhang, Z. & Huang, Y. Ecs-net: Improving weakly supervised semantic segmentation by using connections between class activation maps. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 7263–7272. https://doi.org/10.1109/ICCV48922.2021.00719 (2021).
Lee, S., Lee, J., Lee, J., Park, C. & Yoon, S. Robust tumor localization with pyramid grad-cam. CoRR. http://arxiv.org/abs/1805.11393 (2018).
O’Grady, G., Gharibans, A. A., Du, P. & Huizinga, J. D. The gastric conduction system in health and disease: A translational review. Am. J. Physiol. Gastrointest. Liver Physiol. 321, G527–G542. https://doi.org/10.1152/ajpgi.00065.2021 (2021).
Article CAS PubMed Google Scholar
Tan, M. & Le, Q. V. Efficientnet: Rethinking model scaling for convolutional neural networks. http://arxiv.org/abs/1905.11946 (2019).
Deng, J. et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255. https://doi.org/10.1109/CVPR.2009.5206848 (2009).
Maintainers, T. & Contributors. Torchvision: Pytorch’s Computer Vision Library. https://github.com/pytorch/vision (2016).
Contributors, M. Openmmlab’s Next Generation Video Understanding Toolbox and Benchmark. https://github.com/open-mmlab/mmaction2 (2020).
Li, X., Gan, Y., Duan, D. & Yang, X. Detecting and measuring human gastric peristalsis using magnetically controlled capsule endoscope. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI) 1–5 (2023).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778. https://doi.org/10.1109/CVPR.2016.90 (2016).
Zhang, X., Zhou, X., Lin, M. & Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 6848–6856. https://doi.org/10.1109/CVPR.2018.00716 (2018).

Download references

Acknowledgements

The authors would like to thank Dr. Guohua Xiao for promoting the collaboration during this research.

Author information

Authors and Affiliations

Department of Biomedical Engineering, Stevens Institute of Technology, Hoboken, NJ, 07030, USA
Xueshen Li & Yu Gan
Semcer Center for Healthcare Innovation, Stevens Institute of Technology, Hoboken, NJ, 07030, USA
Xueshen Li & Yu Gan
Department of Research and Development, AnX Robotica, Plano, TX, 75024, USA
Xueshen Li, David Duan & Xiao Yang

Authors

Xueshen Li
View author publications
Search author on:PubMed Google Scholar
Yu Gan
View author publications
Search author on:PubMed Google Scholar
David Duan
View author publications
Search author on:PubMed Google Scholar
Xiao Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

X.L: conceptualization, methodology, formal analysis, validation, software, data curation, writing. Y.G: methodology, reviewing, and editing. D.D: conceptualization, methodology, reviewing. X.Y: conceptualization, methodology, data analysis, editing.

Corresponding author

Correspondence to Xiao Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Supplementary Information 5.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, X., Gan, Y., Duan, D. et al. Toward automatic and reliable evaluation of human gastric motility using magnetically controlled capsule endoscope and deep learning. Sci Rep 15, 25955 (2025). https://doi.org/10.1038/s41598-025-10839-9

Download citation

Received: 01 April 2024
Accepted: 07 July 2025
Published: 17 July 2025
DOI: https://doi.org/10.1038/s41598-025-10839-9

Subjects

Abstract

Similar content being viewed by others

Artificial intelligence enabled automated diagnosis and grading of ulcerative colitis endoscopy images

A unified framework harnessing multi-scale feature ensemble and attention mechanism for gastric polyp and protrusion identification in gastroscope imaging

How machine learning on real world clinical data improves adverse event recording for endoscopy

Introduction

Methods

MCCE dataset

Design of camera motion detector

Workflow of detecting human gastric peristalsis

Improving sensitivity using CAM

Periodical feature detector for human gastric peristalsis

Evaluation setup

Network training parameters

Evaluation metrics

Ethics declarations

Results

Effects of CMD

Detecting human gastric peristalsis

Improving the detection sensitivity using CAM

Measuring period of human gastric peristalsis

Discussion

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links