Introduction

The global mortality rate due to cardiovascular diseases (CVDs) cause a major concern, approximately 17.9 million deaths in 2019 is being attributed to CVDs1. Coronary artery disease is one of the major CVDs; in this pathological condition, the coronary arteries, responsible for supplying oxygenated blood to the heart, get occluded due to plaque deposits on their inner walls2. This may further lead to symptoms such as chest pain, shortness of breath, heart attack in the advanced stage of disease development. The risk factors include high blood pressure, high cholesterols, diabetes, smoking, obesity and a family history of heart disease3. Treatment options for the said condition range from lifestyle changes, medication to surgery. It may be noted that once the disease has become severe, it requires clinical intervention, else, it may prove to be fatal. While coronary catheterization is a gold standard for diagnosing CAD2, it is an invasive and expensive procedure conducted by skilled cardiologists. The proposed approach for detecting CAD would be easily accessible, low-cost, non-invasive, and suited for domestic use. Some study’s have shown that different machine learning techniques have remarkable impact on heart disease detection4.

Medical assessment of CAD typically includes electrocardiography (ECG), treadmill test (TMT), and cardiac catheterization3. CAD may not exhibit any of the symptoms or changes in ECG signal until stenosis has become severe5. Moreover, cardiac catheterization is a completely invasive process as mentioned above. Also, The sensitivity of TMT is 25–71% for subjects having one artery blockage; the sensitivity of this test improves to 81% in case of 2–3 arteries’ blockage6. Therefore, preventive and precise diagnosis of CAD is crucial for reducing mortality rates. Early detection is critical to avoid an increased cardiovascular risk. The overall contribution of this research is to develop a non-invasive and efficient method for screening of CAD subjects while utilizing ABP and PPG measurements. In this context, selected state-of-the art works on detection of CAD are reviewed below.

ECG signal has been readily used for detection of cardiac conditions, as reported in the literature. Al. Hosani et al. has used a linear feature extraction approach with KNN, DT(Decision Tree), and SVM classifiers to detect CAD with maximum accuracy of 90%7. U. R. Acharya et al. utilized entropy feature extraction technique with the same set of classifiers8. V. Jahmunah et al. have utilized convolutional neural networks (CNN) for CAD detection using ECG signals9. Advanced machine learning approaches, such as deep neural networks and reinforcement learning, have demonstrated significant potential in analyzing and predicting physiological signals like ECG for cardiac abnormalities10. Such techniques can address challenges like missing data segments, providing robust signal predictions and enabling accurate feature extraction for clinical applications. In a similar study using ECG signal, the authors analyzed 303 records, examining 26 features that were selected as target features with input from clinical experts. They tested several key classification algorithms such as Multilayer Perceptron (MLP), Support Vector Machine (SVM), Logistic Regression (LR), J48, Random Forest (RF), K-Nearest Neighbors (KNN), and Naive Bayes (NB)11. The accuracy level of ECG-based models has been quite good, but preventive healthcare seems to be difficult in this case, as discussed earlier. Khan et al.12 utilized PCG signal for CAD-detection and revealed 89% of accuracy, but PCG signals can easily be subjected to noise artefacts which makes the data collection process more complicated and difficult.

Now, PPG based models overcome the limitations of ECG and PCG based models for CAD detection7,13. Banerjee R et al. extracted time domain as well as frequency domain features from PPG signal to detect CAD with sensitivity of 73% and specificity of 87% while employing SVM classifier14. Paradkar et al. used temporal positioning of systolic and diastolic phases and characteristic points; the sensitivity of 85% and specificity of 78% for detection of CAD have been reported with SVM classifier15. Ihsan et al. used three methods for feature extraction respiratory rate (RR) interval, heart rate variability (HRV) features, and time domain features where each technique is used seperately which questions the robustness of the method16. Chakraborty et al. utilized statistical features, frequency domain features and time domain features and inter beat variability analysis techniques along with SVM, KNN, DT and LR (Logistic Regression) classifiers for CAD detection, revealing accuracy of 93%17 but used photo-plethysmographic data, which questions about the robustness of the model.

In some of the recent works, two or more cardiovascular signals have been synchronously used to identify markers of CAD. Advances in signal processing for real-time applications have enabled efficient handling of ECG and PPG data through innovative compression techniques. For instance, Banerjee et al. proposed a real-time lossless compression algorithm for ECG and PPG signals using second-order delta encoding and run-length encoding (RLE), achieving high compression ratios while preserving critical signal features18. Banerjee et al. have worked on such a model in which time domain and frequency domain features are extracted from PPG and PCG signals together; the accuracy is reported to be 79%19. Zhao et al. demonstrated the effectiveness of VCG and CDG features with machine learning for detecting coronary microvascular dysfunction20. CAD progression involves hemodynamic changes, particularly in multi-vessel disease, affecting myocardial perfusion. Harmouche et al. emphasized the importance of collateral circulation in preserving heart function21. Our ABP and PPG-based framework builds on these insights to enable non-invasive early CAD detection. To address the aforementioned challenges we propose a novel Integrated Fusion Module which combines the features extracted from both the signals for better classification results and clinical applicability of the proposed framework towards CAD detection. In the proposed work, ABP and PPG signals are together utilized along with state-of-the-art classifiers to identify markers of CAD. The key contributions of this study are as follows:

  • A set of Nineteen clinically significant features are extracted from both the ABP and PPG signals.

  • An Integrated Fusion Module(IFM) designed for accurate classification. It combines the features extracted from the ABP and PPG signals and fed to the Classification Module.

  • Quantitative and qualitative analysis of the predictive probability of the state-of-the-art classification networks for detection of CVDs.

  • The proposed model demonstrates reliable and robust performance, validated by a large-scale dataset from the MIMIC-II database.

Methodology

The proposed Machine Learning framework as depicted in Fig. 1 illustrates the framework for the integrated fusion-based analysis of ABP and PPG signals, aimed at early detection of CAD. The framework consists of three primary stages: pre-processing, the Integrated Fusion Module (IFM), and the classification module. Each stage contributes to optimizing the diagnostic potential of the fused signal data.

Fig. 1
figure 1

An overview of the Machine Learning Analysis of Integrated ABP and PPG Signals. This framework includes the Integrated Fusion Module (IFM) which combines the features from both PPG and ABP signals for better predictive analysis of CAD.

In the pre-processing stage, the raw ABP and PPG signals are subjected to noise removal and filtering to eliminate unwanted artifacts and enhance signal clarity. This ensures that only the most reliable and relevant signal components are used for further processing. Once cleaned, these signals are forwarded to the IFM. In the Integrated Fusion Module, feature extraction is performed separately on the ABP and PPG signals, yielding 19 features from each signal. These features capture key physiological characteristics, such as pulse waveforms, amplitudes, and variability, which are crucial for identifying CAD-related abnormalities. The extracted features from both signals are then fused into an integrated feature set, leveraging the complementary nature of ABP and PPG to provide a comprehensive representation of the cardiovascular state.

Finally, the classification module utilizes the integrated feature set to train and test machine learning classifiers, including KNN, SVM, and neural networks. The classifiers are trained using a portion of the dataset and then tested to predict whether a subject falls into the CAD or non-CAD category. This stage evaluates the diagnostic performance of the fusion-based approach, assessing metrics like accuracy, sensitivity, specificity, and positive predictive value. By combining ABP and PPG signals, the framework enhances diagnostic accuracy and reliability, offering a non-invasive and efficient tool for early CAD detection. This integration emphasizes the potential of multi-signal fusion to improve the identification of critical cardiovascular conditions.

Dataset

ABP and PPG signals have been analysed from 137 subjects out of which 73 suffer from CAD, and 64 do not suffer from any cardiac condition (labeled as non-CAD). Each of the analysed signals have a sampling rate of 125 Hz. The source of this data is MIMIC-II22 waveform database with matched subset, was obtained from PhysioBank ATM23. Figure 2 shows the sample of data while showcasing some of the features.

Fig. 2
figure 2

Basic ABP Signal with some features (Tc = Total Time for one Cycle, Ts = Systolic Time, Td = Diastolic Time).

Pre-processing

The data from MIMIC-II waveform database requires certain pre-processing in order to remove unwanted noise and motion artifacts. A body at rest has a low respiratory rate, but a person sprinting may have a very high respiratory rate24. This rise and fall in respiratory rate might result in low frequency disruptions in the recorded signal. Even the smallest action of a human muscle can result in low frequency sounds25,26. Therefore, it’s essential to keep the subject motionless while recording. The raw signal must be cleaned up since it might take up low-frequency noise during long recordings. Further, motion artifacts and distorted signals were eliminated subsequently. For ABP signals, a fourth order Butterworth band pass filter with a frequency range of 1.5–16 Hz and for PPG signals, a sixth order Butterworth band pass filter with a frequency range of 2.5–16 Hz are utilised. Figure 3 shows raw signal sample before and after filtering; here boxes indicate the affected area.

Feature extraction

In total, 19 features are extracted from each of ABP as well as PPG signals (Each feature equation presented in Supplementary Table S1 and in Figure S1). Table 1 shows average values of the features extracted from ABP and PPG for CAD as well as non-CAD subjects. Firstly, Heart Rate Variability (HRV) of the signal is used to extract relevant time domain features. HRV is a neuro-cardiac function of the brain and heart signals that contain changes or variations in the heartbeat27. Further, HRV is split into its components VLF, LF, and HF rhythms that operate within distinct frequency ranges; these are listed as frequency domain features27. The power spectrum of the HRV signal is divided into three bands: Very Low Frequency (VLF, 0–0.04 Hz), Low Frequency (LF, 0.04–0.15 Hz), and High Frequency (HF, 0.04–0.30 Hz) (HF, 0.15–0.4 Hz), which are also listed in Table 1.

Fig. 3
figure 3

Filtered and non filtered signals (red box indicates the area where filtering has been applied).

Table 1 Extracted features and their range (for ABP and PPG signals for both CAD and Non-CAD).

Integrated fusion module (IFM)

ABP measurements, which include systolic and diastolic pressures are useful for detecting hypertension a major risk factor for CAD. Additionally, the analysis of ABP waveform characteristics provides insights into arterial compliance and resistance, which are critical for diagnosing CAD. Meanwhile, PPG signals are employed to calculate pulse wave velocity (PWV), an indicator of arterial stiffness and a predictor of CAD. In the IFM we have combines the features extracted from both the signals. Nineteen features from both PPG and ABP signals are extracted which are of important biomarkers for detecting CAD. To enhance the classification accuracy of the signals features of the fused ABP and PPG signlas are used for training and testing.

Classification

Three state-of-the-art classifiers, namely, K-nearest neighbours (KNN), Support Vector Machines (SVM) and Neural Networks (NN) have been utilized for CAD/non-CAD classification. Ten-fold cross-validation technique has been used to validate the models. This kind of validation protects against over-fitting by partitioning the data set into folds and estimating accuracy on each fold. In total 70% of data were used for training, for validation 15% of data were used and for testing also 15% data were employed. In case of KNN, weighted KNN with 10 Neighbours has been used and for distance matric, the Euclidean method has been utilized. In case of SVM, Fine Gaussian SVM has been utilized. In case of Neural Network, 100 iterations has been utilized.

Table 2 Performance comparison of different classifiers using PPG, ABP, and the fused model.
Fig. 4
figure 4

Confusion Matrix and ROC Curve of best model from ABP(Model 1), PPG(Model 2) and Fused Model(Modle 3). Where “0” represents NON-CAD and “1” represents CAD patients.

Results and discussion

Experimental results

The classification results are tabulated in Table 2. Performance parameters such as accuracy, sensitivity, specificity and PPV for each of the classifiers and for each of the models have been summarized in Table 2. Also, Confusion Matrix and ROC (Receiver Operating Characteristic) curves for each of the models are presented in Fig. 4. The empirical formulae for Accuracy (Acc Eq. 1), Sensitivity (Se Eq. 2), Specificity (Sp Eq. 3) and Positive Predictive Value (PPV Eq. 4) are detailed below28,29,30.

  • Accuracy (Acc) The proportion of correctly classified cases (both CAD and non-CAD) to the total number of cases. Which Indicates the overall effectiveness of the classification model.

    $$\begin{aligned} Acc= \frac{TP+TN}{TP+TN+FP+FN}\times 100 \end{aligned}$$
    (1)

    where:

    • TP (True Positives): CAD cases correctly identified as CAD.

    • TN (True Negatives): Non-CAD cases correctly identified as non-CAD.

    • FP (False Positives): Non-CAD cases incorrectly identified as CAD.

    • FN (False Negatives): CAD cases incorrectly identified as non-CAD.

  • Sensitivity (Se) The ability of the model to correctly identify CAD cases. Measures how well the model identifies actual CAD cases, minimizing missed diagnoses.

    $$\begin{aligned} Se= \frac{TP}{TN+FN}\times 100 \end{aligned}$$
    (2)
  • Specificity (Sp) The ability of the model to correctly identify non-CAD cases. Measures the model’s effectiveness in avoiding false alarms by correctly classifying non-CAD cases.

    $$\begin{aligned} Sp= \frac{TP}{TN+FP}\times 100 \end{aligned}$$
    (3)
  • Positive predictive value (PPV) The proportion of positive predictions (CAD cases) that are correctly classified. Reflects the reliability of positive predictions and reduces false positives in the CAD diagnosis.

    $$\begin{aligned} PPV= \frac{TP}{TP+FP}\times 100 \end{aligned}$$
    (4)
  • Receiver operating characteristic (ROC) curve A graphical plot that illustrates the trade-off between the True Positive Rate (Sensitivity) and the False Positive Rate (1 - Specificity) across different thresholds is shown in Fig. 4. ROC Evaluates the classifier’s performance irrespective of the decision threshold. In ROC higher area under the ROC curve (Area under the curve) indicates better model performance.

  • Confusion matrix In Fig. 4 confusion matrix is used to summarize the model’s predictions, showing the distribution of TP, TN, FP, and FN. Provides a comprehensive view of the classification performance, helping identify where the model excels or struggles. Where “0” represents NON-CAD patients and “1” represents CAD patients.

Table 3 Comparison of different state-of-the-art models with the proposed model.

Discussion

In this study, several machine learning models were trained and evaluated to classify Coronary Artery Disease (CAD) and non-CAD cases using arterial blood pressure (ABP) and photoplethysmogram (PPG) signals. The results demonstrate that the Neural Network (NN) classifier, utilizing features from both ABP and PPG signals through a fused dataset, outperformed other models in terms of accuracy, sensitivity, and specificity. The NN model achieved the highest classification accuracy of 90.2%, surpassing K-Nearest Neighbors (KNN) and Support Vector Machines (SVM), which also showed competitive but lower performance metrics. Among the single-signal models, ABP-based models performed better than PPG-based models, yet both were significantly improved by the integrated fusion approach.

The Receiver Operating Characteristic (ROC) curves provided further validation of the model’s performance, clearly demonstrating the superior ability of the fused model to distinguish between CAD and NON-CAD cases. The confusion matrix for the NN classifier revealed a balanced trade-off between true positive and true negative rates, further highlighting the robustness of the fused dataset in addressing classification challenges.

In contrast, models trained on single-signal datasets lacked the discriminative power provided by the fused approach. The results underscore the importance of combining complementary physiological signals for robust and reliable classification. This study provides strong evidence that the proposed fusion-based methodology, validated on a large-scale dataset, holds significant potential for early, non-invasive CAD detection and demonstrates practical applicability for real-world clinical use.

Clinical relevance

The clinical relevance of this work is significant, as it aims to detect CAD at an early stage using radial ABP and PPG signals. Early identification of CAD is critical for initiating timely clinical interventions that can prevent the disease from progressing to more severe stages. By utilizing these non-invasive and readily accessible monitoring techniques, the approach can facilitate regular screening and monitoring, making early detection more feasible and widespread. This can lead to prompt medical responses, such as lifestyle modifications, medications, and other therapeutic measures, which are crucial for improving patient outcomes and reducing the incidence of heart attacks. Additionally, early detection through ABP and PPG signals can improve patient compliance and follow-up, as these methods are less intimidating and more comfortable for patients compared to traditional techniques. Consequently, integrating ABP and PPG signals for early CAD detection enhances the classification results significantly which in-turn holds promising impact on preventive healthcare and reducing the overall risk of cardiovascular diseases.

Fig. 5
figure 5

Statistical comparison of different models with our proposed method.

Conclusion and future work

Coronary artery disease is one of the most prevalent diseases and kills millions of people annually. In order to treat this condition and thereby reduce mortality rates, it is crucial to detect it early. To achieve this objective, arterial blood pressure and photoplethysmogram data have been collected for 137 subjects from MIMIC-II waveform database. It is concluded that Neural Network reveals best results with 90% prediction accuracy, 93.3% for patients with CAD, and 89.3% for patients without CAD. Despite the above innovation our study has several limitations. A comparison shown in Fig. 5 and Table 3 can demonstrate the effectiveness of our proposed method over other techniques used. Firstly, validation using external datasets from different institutions or equipment was not conducted. Secondly, the proposed method could not be compared with previous methods within the same analytical environment. Additionally, in this study we have extracted the features manually further the use of deep neural networks for classification on CAD can be implemented. For future work, enhancements can focus on improving the generalization of the dataset and incorporating deep learning methodologies. Additionally, a more comprehensive analysis of other potential features could be explored to further refine the model’s performance.