Enhancing cardiotocography classification via ensemble learning and threshold optimization

Kong, Lingping; Snášel, Václav; Bai, Zhonghai; Vilimek, Dominik; Mirjalili, Seyedali; Pan, Jeng-Shyang; Horakova, Jitka; Martinek, Radek; Vilimkova Kahankova, Radana

doi:10.1038/s41598-025-18990-z

Download PDF

Article
Open access
Published: 04 November 2025

Enhancing cardiotocography classification via ensemble learning and threshold optimization

Lingping Kong¹,
Václav Snášel^1,7,
Zhonghai Bai¹,
Dominik Vilimek²,
Seyedali Mirjalili^3,6,
Jeng-Shyang Pan⁴,
Jitka Horakova⁵,
Radek Martinek² &
…
Radana Vilimkova Kahankova²

Scientific Reports volume 15, Article number: 38528 (2025) Cite this article

1112 Accesses
Metrics details

Subjects

Abstract

Machine learning classifiers trained on imbalanced healthcare datasets often exhibit bias, leading to poor performance on critical cases. The cardiotocography (CTG) dataset exemplifies this issue, where misclassification of pathological cases arises due to both class imbalance and non-optimal probability thresholds. Statistical analysis suggests refining classification thresholds, but this approach has been largely overlooked in CTG data research. To address these challenges, we propose a multifusion method integrating undersampling, threshold-moving optimization, and ensemble classifiers to enhance classification precision while maintaining computational efficiency. Applied to a CTG dataset of 502 cases from Czech Technical University and University Hospital Brno, our method showed significant improvements in identifying pathological cases. While baseline models correctly classified only about 2 out of 11 cases per test, our approach achieved 76.92, 75, and 41.67% precision, accurately identifying 9, 9, and 3 cases out of 12, respectively.

Introduction

Fetal surveillance¹ during labor serves the critical purpose of quickly detecting potentially acidotic fetuses² while minimizing unnecessary interventions. In the course of labor, fetuses can experience recurrent episodes of decreased oxygen supply, a natural occurrence, but those with compromised defense mechanisms are susceptible to developing metabolic acidosis³. This condition can result in long-term consequences such as neurodevelopmental disabilities, cerebral palsy, or, in severe cases, fetal death. Therefore, continuous fetal monitoring during labor is indispensable, typically performed using a cardiotocograph (CTG)⁴, which simultaneously records fetal heart rate (FHR) and uterine contraction (UC) signals. However, the current CTG assessment is based heavily on visual analysis of various FHR signal patterns, following medical guidelines. Regrettably, this method exhibits significant discrepancies among observers⁵, lacks objectivity, and demonstrates poor interpretive consistency.

To address these challenges, numerous digital assistance systems have been developed to help clinicians interpret CTG results⁶. However, there is insufficient evidence to suggest that these systems effectively reduce the incidence of newborn acidosis without increasing obstetric interventions compared to traditional CTG analysis. Consequently, numerous signal-processing techniques have been explored to uncover concealed FHR characteristics that could distinguish between normal fetal status and acidosis. However, the outcomes obtained so far have not been satisfactory for integration into clinical practice.

Recent clinical research emphasizes the importance of accurately identifying hypoxemia⁷, which requires a thorough understanding of the fetal compensatory mechanisms. These mechanisms, influenced by the autonomic nervous system, adapt the fetus to increased activity levels during perceived oxygen deficiency. The dynamic nature of fetal heart rate (FHR), reflecting the modulation of the autonomic nervous system⁸, contains crucial information on fetal well-being. Conventional signal processing methods may not be suitable for precise CTG assessment as they fail to integrate these physiological characteristics, particularly the nonlinear and nonstationary aspects of FHR resulting from autonomic nervous system modulation.

Moreover, CTG databases, such as the CTU-UHB data derived from the Czech Technical University (CTU) in Prague and the University Hospital in Brno (UHB), often present unbalanced data, where the data set contains significantly fewer samples of ‘Pathological’ cases compared to the ‘Normal’ class⁹. Learning from unbalanced data¹⁰ poses a challenge to classification algorithms, and this leads to models with inadequate predictive accuracy, particularly in regard to the minority class. This poses a significant issue as the minority class, e.g., the ‘pathological class, usually holds more significance, making the problem more prone to classification errors in the minority class compared to the majority class.

Several studies¹¹ have explored classification tasks using the CTU-UHB dataset, each employing different methods, split criteria, and feature sets, as notable from summary in Table 1. The works differ notably in the stage of labor considered (Stage I, Stage II, or unspecified), the selection and size of the data subsets (often using a binary classification approach), and the thresholds used for dividing normal and pathological cases, primarily based on pH values (ranging from 7.05 to 7.20) or additional clinical indicators (e.g., base deficit, Apgar scores, birth weight). Various signal processing and machine learning methods were applied, including spectral and time-frequency analysis (FFT, Wavelet Transform), feature-based approaches (e.g., EMD, recurrence plots, common spatial patterns), and different classifiers (SVM, kNN, ANN, CNN, and fuzzy logic-based models). Some studies relied solely on feature extraction from the signals, while others transformed signals into two-dimensional representations for deep learning. The number of features used for classification also varied considerably, ranging from direct signal transformations without explicit feature extraction to structured sets of selected features based on clinical relevance or optimization algorithms.

Table 1 Summary of recent related studies using pH for label formation of classification task on CTU-UHB dataset, where stage corresponds to the part of the labor used, the size (P-S-N) indicates the number of instances for pathological, suspicious, and normal. FFT, fast fourier transform; SVM, support vector machine; RF, random forests; FC$\epsilon$H; DT, decision tree; PRF, probabilistic random forest; CWT, continuous wavelet transform; kNN, k-nearest neighbor.

Subjects

Abstract

Introduction

Results

Ablation comparison between with and without threshold tuning

Comparison between the proposed model with the baseline

Sampling method selection and baseline classifier

Selection on sampling method

Selection on classifier

Feature importance and explainability

Result comparison with other classifier with hyperparameter tuning

Discussion

Methods

Data preparation

Stage I: Pre-processing

Stage II: Feature extraction

Stage III: Data split

Stage IV: Multi-model fitting

Stage V: Auto-threshold optimization

Classification strategy based on \(\lambda\)

Time and space complexity

Feature name and abbreviations

Conclusion

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links