Introduction

Blood Pressure (BP) levels that are abnormal can have lethal consequences, increase the risk of Cardiovascular Disease (CVD), and damage organs. Heart attacks, strokes, renal failure, and other conditions can all be brought on by chronic hypertension1. Hypertension constitutes a major risk factor for CVD, which continues to be the primary cause of mortality globally2. In 2019, according to the World Health Organization (WHO), CVD accounted for 17.9 million deaths, representing 31.4% of total fatalities worldwide including countries such as United Kingdom and Northern Ireland registered them as second leading cause of fatalities with 28.7% and 23.4% of total deaths respectively3. The CVD mortality has risen consistently, with figures increasing from 14.2 million in 2000 to 15.9 million in 2010, and reaching 17.9 million in 2019 (Fig. 1a). The WHO also recorded that across most European countries, the Russian Federation, India, China, and the United States, CVDs have consistently remained the leading cause of fatality with 28.1% of total (Fig. 1b).

Fig. 1
figure 1

(a) Global causes of death 2000, 2010, 2019—WHO Global Observatory3 (b) Causes of death in the Europe, China, India, United Kingdom, Russian Federation and United State of America, 2016—WHO Global Observatory3.

With the rising trend, it is essential to continuously monitor BP to detect, control, and treat hemodynamic abnormalities and CVD in their early stages4. Thus, this motivates to develop a reliable and non-invasive deep learning (DL) based framework for reliable BP estimation using PPG signals, enabling timely monitoring, early detection of cardiovascular abnormalities and enhance patient care. Blood pressure can be intermittently or continuously monitored. Sphygmomanometry and oscillometry are two examples of intermittent approaches; the former uses Korotkoff noises, while the latter provides automated, hands-free measurements. However, intermittent approaches have drawbacks such as the need for training, quiet environments, and time lag between readings5,6.

Continuous monitoring techniques include volume clamp, invasive arterial BP estimation, arterial tonometry, electrocardiogram (ECG), and photoplethysmography (PPG). Invasive methods are commonly used in healthcare but are limited to in-house patients7. The common method for ECG recording is done by using electrodes as explained by Satter et al.8. It is non-invasive and reliable for cardiovascular diagnosis but not feasible for continuous monitoring for non-patients. PPG uses optical approaches to measure variations in blood volume without the need for intrusive procedures9. PPG can be captured in transmissive or reflective modes and has applications in clinical and mobile settings10,11. Smartwatches and wearable technologies can benefit from its real-time monitoring capabilities, which are especially helpful for athletes, exercisers, and the elderly. The study aims to use advanced DL techniques by employing a unique transfer learning framework with scalogram-based BP estimation.

The rest of the paper is organized as follows: The second section elaborates on the review of literature over the past years, highlighting their evolution from traditional approaches to advanced Machine Learning (ML) and DL methods. Subsequently, the methodology and materials adopted for BP estimation are explained including dataset, windowing, and scalogram generation, continuous wavelet transform (CWT), and pre-trained CNNs. Section Hardware and software specifications encompasses discussions about the paper’s outcomes and performance comparisons with various pre-trained CNN models.

Related work

From statistics to deep learning techniques, PPG signal analysis has changed over time. For blood pressure estimation, a variety of parameters have been used, including Pulse Transit Time (PTT)12, Pulse Wave Velocity (PWV)13,14 etc. A measure of arterial stiffness is PWV, while PTT counts the time an arterial wave travels between two arterial sites. Multiple sensors are needed for statistical methods, which are impacted by variables such as age, weight, etc.15,16. PWA uses linear regressions to assess PPG waveform morphological aspects to predict blood pressure14,17. Because of their potential, PTT and PWV approaches are still being investigated; nevertheless, one of their limitations is that they require two sensors that can precisely measure the distance between artery sites.

ML approaches, such as Random Forests (RF), Support Vector Machines (SVM), Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM), and Convolutional Neural Networks (CNN) have been used to estimate BP using constructed features generated from PPG data18,19,20,21,22,23,24. Feature engineering can be time-consuming and may not guarantee accuracy in PPG signal analysis18,25,26,27,28,29,30. The optimum features are selected by incorporating various correlation techniques. Alternatively, deep neural networks can learn characteristics or features directly from PPG signals, eliminating the necessity of manually crafting features31,32,33. Due to the ability of deep learning approaches which can automatically extract high dimensional features from PPG signals, it has received attention. Researchers have used auto-encoders, multi-layered neural networks, and deep neural network models to estimate BP, achieving improved results compared to traditional methods34,35,36,37. The U-Net architecture has been applied for image segmentation38,39, leading to its utilization for PPG signal mining. However, existing models still have limitations, such as information redundancy and gradient vanishing. Researchers have made improvements by incorporating attention modules, residual modules, and enhanced U-Net models40,41. These optimizations have demonstrated success in image segmentation but are still in the early stages of application of continuous blood pressure monitoring using physiological signals42,43,44,45,46. Although several deep learning algorithms have been investigated, they frequently need significant computer resources and memory.

Recent research in transfer learning and domain adaption highlights the growing focus on improving model efficiency and adaptability across various tasks and domains. A comprehensive overview of transfer learning in deep reinforcement learning is given by Zhuang et al.47, while Wang et al.48 investigate source free unsupervised domain adaptation, which enables models to adapt without having access to the source data. Zhang et al.49 focus on learning from multi sources for domain adaptation, and Luo et al.50 address the challenge of handling inaccurate label spaces in multi source domain adaptation. These techniques could be used for BP estimation using PPG signals as they help models adapt to various populations and sensor types, reduce reliability on large, labelled datasets, and maintain performance despite label noise. Deep neural networks have high computational and memory demands, but pretrained CNNs enhanced through transfer learning, reduce these requirements and improve efficiency. This approach decreases the need for extensive labelled data and enables adaptation to new tasks with lower resources consumption.

Pretrained CNNs have been used in clinical areas like brain tumour detection from MRI scans, histopathology image classification, and brain image analysis51,52,53. These successes suggest that pretrained CNNs, which have been effectively used in prior studies for BP estimation, could be valuable for this task, thereby allowing us to take advantage of their established strengths, improving both the efficiency and accuracy of the system while also making it more resource efficient in BP estimation. The following are the key research gaps identified after rigorous literature survey:

  • Dependency on sensor placement: The measurement process is difficult with the methods such as PTT and PWV which requires two or more sensors.

  • Hand crafted feature engineering: The identification of hand-crafted features by exhaustive search and optimizing it requires time and considerable effort.

  • Computational complexities: The deep learning models automatically extract features, but it requires high computational and memory.

These gaps underline the need for insightful methods like transfer learning to overcome the limitations by reducing memory and high computational requirements.

In order to apply transfer learning to time series inputs, PPG signals can be transformed into images by using visibility graphs54 but preserving a relevant temporal and frequency information remains a challenging task. On the other hand, scalograms that represents time–frequency information at different scales, offers a more simple and straightforward way to preserve relevant time and frequency information, unlike visibility graphs which involve identifying complex points, checking for visibility and creating edges and nodes. The utilization of scalograms has been done in epilepsy diagnosis from EEG signals55 has prompted the application of scalograms in estimating BP values from PPG signals. Scalograms retain both time and frequency details of PPG signals, making them valuable for BP regression tasks. Though, a previous study has demonstrated the effectiveness of utilizing scalograms for BP classification using models such as pretrained CNN models56,57,58 and deep CNN architectures59,60 and evaluated the performance of their model using accuracy metric. These studies have primarily focused on blood pressure classification based on hypertension stages, laying a foundation that can be further extended to BP regression. Thus, it can be built upon these approaches to estimate continuous BP values and evaluate them against clinical standards such as the Advancement of Medical Instrumentation (AAMI) and the British Hypertension Society (BHS) guidelines.

In one of the research works21, time and frequency features were separately extracted for BP estimation wherein frequency features were obtained by taking Fast Fourier Transform (FFT). Time-varying autoregressive (TV-AR) methods have been used for estimating heart rate variability61. Unlike scalograms that provides direct time–frequency representation, TV-AR captures spectral changes over time by varying parameters making it tedious and complex to implement effectively, and thus resolution is affected. The Short-Time Fourier Transform (STFT) has been used in PPG based blood pressure estimation62 and obtained accuracy by classifying it as normal or Hypertension. It could be explored in future studies for BP regression task from PPG signals as it provides time–frequency information. However, the CWT has been used in this study. Due its ability to adapt to different frequency components at varying time points, offers a promising alternative for capturing the nuanced dynamics of PPG signals.

Based on the existing literature, the utilization of scalograms for blood pressure estimation has been explored in a few studies, where deep learning models such as CNN-SVR, compound multichannel CNN, and image encoding and fusion BP techniques have been applied to perform BP regression. Maharajan et al.63 assessed performance using RMSE, while Lu et al.64 employed Mean Error and standard deviation. Liu et al.65 utilized custom image fusion methods instead of scalograms and evaluated MAE. Their study incorporated BHS-based evaluation and achieved Grade A, and further exploration of standard deviation assessment can enhance alignment with AAMI clinical standards, presenting an opportunity for advancement.

To address the challenge of blood pressure estimation, this algorithm offers a data driven, end to end solution using PPG signals. It employs pretrained models, which provide low computational cost. Additionally, to meet clinical standards, the approach emphasizes scalogram based preprocessing, yielding deep features and eliminating the need for manual feature engineering. The following are the primary contributions of the current study:

  • The application of continuous wavelet transform using Morlet wavelets to generate scalograms, effectively captures both time and frequency information from PPG signals and estimates BP accurately. Morlet wavelet has the capability of analysing oscillatory signals such as PPG, as its kernel consist of complex exponential wrapped in gaussian envelop.

  • Instead of training models from scratch, the pretrained CNN models are employed to extract deep features from generated scalograms, minimizing the computational cost.

  • The random forest was used to estimate systolic and diastolic BP values in compliance with clinical standards, due to its robustness and ability to handle complex data patterns as compare to other machine learning techniques.

  • Unlike previous studies, the BP estimation was validated using both AAMI and BHS standards making it reliable in health care applications.

The next section gives the detailed description about the methodology and materials used in the paper starting from dataset used, PPG segmentation, obtaining scalograms, pretrained models for deep features and random forest to estimate BP values.

Methodology and materials description

The proposed model for continuous and non-invasive estimation of BP pressure using PPG signals is depicted below in Fig. 2.

Fig. 2
figure 2

Architecture of the pretrained convolutional neural networks and random forest.

The Fig. 2 gives the detailed information about the data which is collected from the “Medical Information Mart for Intensive Care” (MIMIC II) database, a valuable source of medical information. Then, the collected PPG data is pre-processed and subsequently, a CWT is employed to the preprocessed data, resulting in the generation of two-dimensional representations known as scalograms. Furthermore, the VGG16, ResNet50, InceptionV3, NASNetLarge, InceptionResNetV2, and ConvNeXtTiny are used to capture the deep features, i.e., characteristics from Scalograms. Finally, the Random Forest model utilizes the extracted features in BP estimation.

The significance of this research lies in the use of scalograms for blood pressure estimation based on clinical standard evaluation. This approach sets itself apart from existing methods by transforming PPG signals into scalograms with a focus on BP regression and evaluating performance based on clinical standards. The proposed model achieves MAE and SD below 5 mmHg and 8 mmHg respectively, while also attaining Grade A classification under BHS.

The RF model was chosen for its effectiveness in managing complex feature sets derived from scalograms. A prior study66 compared machine learning algorithms for BP estimation from HCFs and showed that Random Forest outperformed other methods with MAE of 4.45 mmHg while adhering to AAMI standards. Its ensemble learning techniques aggregates predictions from several decision trees, reducing variance and preventing overfitting making it a good choice for handling the varied feature outputs from different pretrained CNNs. It provides feature importance rankings and helping in identifying the most relevant features from the scalograms.

A key aspect of integration of scalograms with pretrained CNN models, combined with a Random Forest (RF) algorithm. The RF model is tuned with number of estimators set to 100. This unique combination uses the strengths of pretrained models that captures intricate pattern in PPG signals while incorporating the robustness of RF for BP regression. The proposed model aims to enhance the generalizability and reliability of BP estimation.

These pretrained models were selected to address a range of architectural challenges, from the classic VGG16 to more recent designs like ConvNeXtTiny. This consistent representation enables a direct assessment of each model’s ability to handle the time–frequency information of the PPG data. The RF model further refines these extracted concrete features, enhancing the stability of BP predictions.

This structured methodology not only improves performance metrics but also reinforces the clinical relevance and real-world applicability of the proposed model, making it well-suited for practical deployment in wearable and healthcare applications.

Dataset

The dataset used was obtained from the University of California Irvine (UCI) Machine Learning Repository67, which is a subgroup of the MIMIC-II waveform database68. It is which is made available online by the PhysioNet organization. The UCI dataset, basically collected from Physionet database, consist of 12,000 records that has synchronized measurements of PPG, arterial blood pressure (ABP), and ECG, all sampled at a frequency (fs) of 125 Hz. The ABP measurements were measured invasively which is widely acknowledged as the gold standard for BP assessment69. Hence, the ABP waveforms available in MIMIC II database serve as the reference values for blood pressure in the research. The MIMIC II offers a well-established collection of waveform data that has been widely used in research studies related to healthcare analytics and monitoring. In this work, ABP and PPG waveforms were utilized for non-invasive blood pressure measurement and the details are listed in Table 1.

Table 1 Descriptive statistics of the dataset.

The MIMIC II dataset was used from where the PPG and ECG signals were loaded and extracted which is shown in Algorithm 1.

Algorithm 1
figure a

Data loading and signal extraction

Following the comprehensive exploration of the dataset, the next stage of this methodology is to uproot the temporal and spectral dynamics encompassed in the PPG signals through windowing and scalogram generation.

Windowing and scalogram generation

The PPG and ABP data were windowed and scalograms were generated which is shown in Table 2. By segmenting the records into non-overlapping segments and applying the Morlet transform to PPG signals, converts it into 2D array called as scalograms.

Table 2 Data windowing and scalogram generation.

The scalograms effectively represent the signals by retaining temporal and spectral characteristics. The following steps demonstrate the windowing of signals and generation of scalograms:

  1. 1.

    The PPG and ABP was segmented into non-overlapping segments of 1000 samples (8 s) which is mentioned in Algorithm 2. The duration of each segment is calculated by using the Eq. (1).

    $$Duration = \frac{No. \;of\;Samples}{{Sampling\;Frequency\;\left( {f_{s} } \right)}} = \frac{1000}{{125}} = 8\;{\text{seconds}}\;\;({\text{Since}}\;f_{s} = 125\;{\text{Hz}})$$
    (1)
  2. 2.

    The PPG data is filtered from 0.1 Hz to 8 Hz (Algorithm 3) as this range has significant information related to the DC component (baseline) and the AC component (blood volume changes due to heart pump). This range was determined by analyzing the Fourier Transform (FT) of PPG signals, detailed in Section Continuous wavelet transform and illustrated in Fig. 3, and is consistent with prior studies analyzing PPG signals for blood pressure estimation32,35,36,46. The diastolic and systolic BP reference values were obtained from segments of ABP waveforms, corresponding to each PPG segment (Algorithm 4).

  3. 3.

    A Morlet transform, using a continuous wavelet transform with 128 scales, was applied to the PPG segments (Algorithm 5).

  4. 4.

    Scalograms were generated, with each scalogram having a size of 128 by 1000.

  5. 5.

    The scalograms were resized to dimensions suitable for the pre-trained CNN model, ensuring compatibility with its input requirements.

Fig. 3
figure 3

Frequency spectrum of a PPG segment—(a) a plot showing DC component (b) a plot showing frequency range from 4 to 12 Hz.

Algorithm 2
figure b

Windowing of PPG and ABP signals

Algorithm 3
figure c

Filtering PPG signal

Algorithm 4
figure d

Extracting SBP and DBP from ABP

Algorithm 5
figure e

Scalogram generation of PPG signals

Thus, this process helps us to prepare the scalograms (two-dimensional data) which is applied as input for further analysis using the pre-trained CNN model, ensuring that the original time and frequency components of the PPG signals were preserved and utilized in the subsequent analysis.

Continuing on the use of the Morlet wavelet to obtain scalograms, the subsequent section discusses continuous wavelet transform. Moving forward to a greater understanding of CWT, its kinds, and its mathematical complexities, we hope to clarify how this transformative method aids in the preservation and application of crucial signal characteristics.

Continuous wavelet transform

Similar to the FT, the CWT calculates the similarity between an input signal and analyzing function using inner products.

The analytic function used by the CWT is wavelets, while the FT uses complex exponentials. The process involves comparing the input signal with various scales and positions of the shifted, compressed, or stretched wavelet versions to construct a two-variable function. This representation enlarges a one-dimensional signal into a two-dimensional function, capturing location and scale information, and offers a rich and detailed characterization. Depending upon the nature of the wavelet, the CWT can be complex-valued or real-valued. Equation 2 illustrates the CWT formula, which is dependent on the wavelet choice as well as scale (a) and position (b) factors.

$$C\left(a,b;f\left(t\right),\psi \left(t\right)\right)={\int }_{-\infty }^{\infty }f(t)\frac{1}{a}{\psi }^{*}\left(\frac{t-b}{a}\right)dt$$
(2)

where ψ(t) is the wavelet basis, * denotes the complex conjugate, and f(t) denotes the signal. The wavelet’s time point, scale, and shift are represented by the letters “t,” “a,” and “b,” respectively. The choice wavelet affects the continuous wavelet transform coefficients in addition to the scale and position values.

Typical wavelets that are employed in the continuous wavelet transform are the Poisson, Morlet, and Mexican Hat70. The wavelets have distinct qualities and are appropriate for various signal processing applications. Mathematically, it is defined as:

$${\psi }_{Poi}\left(t\right)=c*{e}^{-\lambda t}{t}^{n}H\left(t\right)$$
(3)

where H(t) is the Heaviside step function. It has 0 for t < 0 and 1 for t ≥ 0; c is a normalization constant; λ is the decay parameter; t is the time variable; n is a positive integer determining the degree of the polynomial term.

The Poisson wavelet is a popular tool for transient signal analysis, and it’s especially good at identifying and describing abrupt changes or discontinuities in time-series data.

The Ricker wavelet, often called the Mexican Hat wavelet, is another kind of wavelet used in analysis of signals. It is the second derivative of gaussian function, mathematically defined as follows:

$${\psi }_{Mex}\left(t\right)=A\left(\frac{1-{t}^{2}}{{\sigma }^{2}}\right)\left({e}^{\frac{{-t}^{2}}{2{\sigma }^{2}}}\right)$$
(4)

where A is a normalization constant; t is the time variable; σ is a positive parameter determining the width of the wavelet.

It is widely applied in many different fields, including pattern recognition, image processing, and seismic research. It is particularly useful for identifying patterns like peaks and troughs in data. A complex sinusoid and a Gaussian window are combined in the Morlet wavelet, which makes it an excellent tool for oscillatory signal analysis. Based on the specific needs and attributes of the signal under analysis, one can choose from several wavelets, each with unique characteristics.

When the signal is convolved with the different Morlets, amplitude information is obtained at these specific frequencies. This analysis makes it easier to thoroughly explore the temporal and spectral content of the signal and provides insightful information about its time–frequency properties. Consequently, the Morlet wavelet’s capacity to analyze oscillatory signals and efficiently capture time and frequency information is one advantage of using it as an analyzing function60. Furthermore, convolution preserves temporal resolution, its Gaussian frequency pattern lowers misleading ripple effects, and its minimal calculations using the fast Fourier transform result in improved computing efficiency. This feature is particularly beneficial for the analysis physiological signals like electrocardiograms and photoplethysmograms.

The Morlet wavelet is mathematically defined as:

$${\psi }_{Mor}\left(t\right)={e}^{j2\pi ft}{e}^{\frac{{-t}^{2}}{2{\sigma }^{2}}}$$
(5)

where j is the imaginary operator, f denotes frequency in Hz, and t indicates time in seconds.

The width of the Gaussian, denoted as σ, is defined as

$$\sigma =\frac{n}{2\pi f}$$
(6)

The parameter ‘n’ establishes a trade-off between time and frequency precision, often termed as the “no. of cycles”. For neurophysiological data like EEG, and MEG, the parameter ‘n’ usually varies between 2 and 15 for frequencies ranging from 2 to 80 Hz71. Considering that PPG signals, similar to EEG, are physiological data with a range of frequencies from 0.5 Hz to 10 Hz and the mapping suggests that the value of ‘n’ can be selected as 3.

The Fourier transform of a PPG segment, shown in the Fig. 3, clearly indicates this frequency range. It has a DC Components indicating the baseline and the AC component indicating the volumetric changes of blood in the tissue due to Heartbeat.

The Fourier transform analysis (Fig. 3) indicates that the frequency range of 0.1 Hz to 8 Hz captures the most significant information in PPG signals. The DC component reflects the baseline level, while the AC component, primarily within this range, corresponds to the volumetric changes in blood due to cardiac activity, making it critical for accurate blood pressure estimation.

Further, the real and imaginary part of Morlet Wavelet for different values of cycles and scale factor is shown in Fig. 4.

Fig. 4
figure 4

Morlet Wavelet for different values of no. of cycles and scales.

With the positive aspects of the Morlet transform established, the utilization of a pre-trained CNN model has been explored as the next phase of this research. The next section focuses on the incorporation of the pre-trained CNN model to further enhance the analysis and estimation of blood pressure.

Convolutional neural networks: an overview

CNNs are a kind of DL model made especially for processing grid-like input. It represents a powerful paradigm in signal processing that uses hierarchical feature extraction to enhance model accuracy. In the proposed framework, CNNs are employed learn by itself and identify features within the input data, facilitating the understanding of complex signals. The key elements of CNN are discussed below.

Key elements of a CNN

  1. 1.

    Convolutional Layers: These layers are the heart of a CNN. They consist of kernels. also known as filters, that systematically slide by s over the input array to extract relevant features. The convolution operation is mathematically defined as:

    $$I\left(x,y\right)*f\left(x,y\right)=\sum_{i}\sum_{j}I\left(i.j\right)f\left(x-i,y-j\right)$$
    (6)

    where I(x, y) is an input array corresponding to individual colour channel, f(x, y) is a kernel.

    The output dimension is given by

    $$\left[\frac{m+2p-f}{s}+1\right]\times \left[\frac{m+2p-f}{s}+1\right]$$
    (7)

    where, m is the dimension of input image, p is number of zero padding around the border of the image, f is the kernel size and s is stride value.

  2. 2.

    Activation Functions: Commonly used activation functions include Rectified Linear Unit (ReLU) and Sigmoid. They introduce non-linearity, enabling it to learn complex relationships in the data.

    $$ReLU: \text{a}\left(\text{t}\right)=\text{max}\left(0,t\right)$$
    (8)
    $$sigmoid: \sigma \left(t\right)=\frac{1}{1+{e}^{-t}}$$
    (9)
  3. 3.

    Pooling Layers: These layers decimate the spatial dimensions of the feature maps, reducing computational complexity. Max pooling takes the maximum value from a local region of the feature map. If max pooling size is 2 by 2 and applied with stride 2 then the dimension of output will be reduced by half.

  4. 4.

    Fully Connected Layers: These layers connect every neuron in one layer to every neuron in the next layer. They are typically found towards the end of a CNN and are responsible for high-level reasoning.

CNNs are trained through backpropagation, where the weights of the network are updated to minimize the difference between predicted and actual outputs. Thus, CNN automatically learns the hierarchical features from data makes them an indispensable tool in Biomedical signal processing applications.

Pretrained CNN models

For photoplethysmography (PPG) based blood pressure estimation, pretrained models were used which has capability to extract remarkable deep features from extensive training on large datasets. In our framework, pretrained models serve as efficient feature extractors which processes two-dimensional arrays derived from PPG segments using continuous wavelet transform.

This research seeks to analyze and compare the accuracy of different pretrained models, namely, as VGG16, ResNet50, InceptionV3, NASNetLarge, InceptionResNetV2 and ConvNeXtTiny, in capturing relevant features from the scalograms and their effectiveness in accurately estimating systolic and diastolic BP values.

VGG16

In blood pressure estimation framework, VGG16 pretrained model was incorporated which was introduced by Simonyan and Zisserman in 2014, as a foundational CNN. VGG16 is recognized for its architectural simplicity and efficacy, maintaining uniformity with a consistent kernel size of 3 × 3, a stride of 1, and MAXPOOL layers of size 2 × 2 with a stride of 2. There are 16 levels as the number of kernels gradually rises by a factor of two to 512. A RGB image with dimensions of 224 × 224 is required as input for VGG16, which uses input normalization by the subtraction of mean pixel values72. Upon applying the Morlet wavelet to the PPG segment, a scalogram with initial size of 128 by 1000 is generated. However, a further transformation is required for compatibility with the VGG16 design, which demands a 224 by 224 by 3 input. This means that the 128 by 1000 grayscale array must be converted to RGB format. By doing so, a modified array of dimensions 224 by 224 by 3 was obtained that ensures seamless integration with the input requirements of the VGG16 pretrained model. The Fig. 5 illustrates the adapted architecture of VGG16 utilized in the blood pressure estimation framework with output dimension 7 × 7 × 512.

Fig. 5
figure 5

VGG16 architecture.

ResNet50

The deep residual network ResNet50, which was first introduced by He et al.73, is a key part of this approach for blood pressure assessment. With 50 layers, ResNet50 makes it easier to train incredibly deep CNNs and is now commonly used for a variety of computer vision tasks. Like VGG16, ResNet50 applies input normalization by removing the mean pixel values from a 224 × 224 RGB image as input73. ResNet50’s design is illustrated by a decrease in feature map dimension as depth increases, as seen in Fig. 6. The final output dimension of this model is 7 × 7 × 2048, yielding 2048 feature maps of size 7 × 7.

Fig. 6
figure 6

Resnet50 architecture.

The obtained scalogram of dimensions 128 by 1000 is resized to meet the input requirements of ResNet50. The adapted ResNet50 architecture, showcased in the Fig. 6, serves as a feature extractor.

Inceptionv3

The pretrained Convolutional Neural Network (CNN) model InceptionV3, created by Google74, is essential to the system for estimating blood pressure. To collect characteristics across different scales, InceptionV3 uses multiple concurrent convolutional paths with varied kernel sizes.

InceptionV3 expects a 224 by 224 by 3 input dimensions. The scalogram, which is originally 128 by 1000 in size, is transformed to fit the input specifications of InceptionV3. As part of this, the grayscale array must be formatted into RGB to make sure the model is compatible. The architecture of InceptionV3 is shown in the picture, which includes its convolutional, batch normalization, and pooling layers. This results in the creation of 2048 feature maps of dimension 5 by 5. InceptionV3 is also used as a feature extractor in the blood pressure prediction pipeline.

NASNetLarge

NASNetLarge, introduced by Zoph et al.75, is a neural architecture search-based network that achieves cutting-edge performance on various image recognition tasks. It also utilizes input normalization through subtracting the mean pixel values. It requires an input dimension of 331 by 331 by 3, thus the scalogram of size 128 by 1000 is transformed to align with the model’s input specifications. Thus, in the proposed approach, the pretrained CNN models (VGG16, ResNet50, Inception v3 and NASNetLarge) are used as feature extractors with the top layers being removed. The layers of the models are frozen except the fully connected dense layers. The output obtained from these pretrained models, referred to as deep features, captures high-level representations automatically extracted from the input images.

InceptionResNetV2

The Inception architecture and residual connections from ResNet are combined in InceptionResNetV2, which was created by Szegedy et al. in 201776. The combination produces a deep CNN that is highly effective in a several computer vision tasks. InceptionResNetV2, like NASNetLarge, applies input normalization through the subtraction of mean pixel values. It requires that input should be of dimensions of 299 by 299 by 3. To ensure that it is compatible with the model, input data i.e. scalograms is preprocessed. Together with other pretrained CNN models, InceptionResNetV2 is used as a feature extractor in the suggested methodology. In order to extract deep characteristics from input images, the network’s architecture involves freezing all layers other than the dense layers that are completely connected. The network’s architecture involves freezing all layers except the top layer and thereby produce deep features which are used in BP estimation.

ConvNeXtTiny

ConvNeXtTiny is a small CNN architecture that was presented by Liu Z et al. in 202077 and is intended for high performance in image recognition applications through efficient computing. ConvNeXtTiny’s architecture is optimized for resource-restricted situations, which sets it apart from the rest five models, and makes it appropriate for applications with constrained computational resources. One common preprocessing procedure that ensures consistency among input data is input normalization. The network anticipates that input images will have 224 × 224 by 3 dimensions. ConvNeXtTiny has frozen layers and the fully connected dense layers are fine-tuned. By using this method, deep characteristics from input can be effectively extracted and create useful representations for later tasks i.e. estimating blood pressure.

The process of extracting and flattening deep features from scalograms using the pretrained models mentioned above is shown in Algorithm 6.

Algorithm 6
figure f

Feature extraction using pretrained models

The deep features extracted from the pretrained models are flattened to create a one-dimensional feature vector. This vector serves as the input to a random forest regressor described in the subsequent subsection, which predicts the blood pressure values.

Random forest

In random forest regression model, multiple decision trees are built by training on different bootstrapped subset of the feature vector (fattened) obtained from CNN. The dataset consists of m samples, denoted by D, \({\left\{\left({x}_{k}, {y}_{k}\right)\right\}}_{k=1}^{m}\) has p features in each sample, where \({x}_{k}\) is the feature vector and \({y}_{k}\) is the output vector. The next step is to create B bootstrapped datasets i.e. the no. of estimators (trees), denoted as \({D}_{b}\), \({\left\{\left({{x}_{k}}^{(b)}, {{y}_{k}}^{(b)}\right)\right\}}_{k=1}^{m}\) for b = 1, 2…B.

Once the bootstrapped dataset is created then the training process is started which involves repeatedly splitting the sample space into disjoint regions \({R}_{j}\) based on the features values. For each node, q features are randomly selected from p features. The best features are chosen with the corresponding split points such that the MAE is minimized within each region.

In each region \({R}_{j}\), the predictions \({c}_{j}\) is made by taking the average of the target values for \({y}_{k}\) for all the training samples that fall in that region, which is given by:

$${c}_{j}=\frac{1}{\left|{R}_{j}\right|}\sum_{{x}_{k}\in {R}_{j}}{y}_{k}$$
(10)

where \(\left|{R}_{j}\right|\) is the number of samples in region \({R}_{j}\).

For a new input \({x}_{k}\), the prediction from a single tree is determine by the region \({R}_{j}\) that the input \({x}_{k}\) falls into and the \({c}_{j}\) is obtained which is mathematically given as

$${f}_{b}\left({x}_{k}\right)=\sum_{j=1}^{F}{c}_{j}I({x}_{k}\in {R}_{j})$$
(11)

where \({x}_{k}\) = new data.

F = no. of regions or splits in the sample space.

Where \(I\left({x}_{k}\in {R}_{j}\right)\) = indicator function which is given by

$$I\left({x}_{k}\in {R}_{j}\right)=\left\{\begin{array}{c}1, {x}_{k}\in {R}_{j}\\ 0, otherwise\end{array}\right.$$
(12)

The final prediction of the Random Forest model is obtained by averaging the predictions from all B decision trees. The overall prediction \(f\left({x}_{k}\right)\) is given by

$$f\left({x}_{k}\right)=\frac{1}{B}\sum_{b=1}^{B}{f}_{b}\left({x}_{k}\right)$$
(13)

where B = no. of bootstrapped subsets (or trees).

The Algorithm 7 gives the detailed steps of implementation of Random Forest and predictions made by multiple trees. The averaging of the predictions made from multiple trees helps in reducing the variance thus reduces overfitting making the model more robust and generalizable.

The k-fold cross validation is also employed (Algorithm 7) to increase the robustness of random forest. In k-fold cross validation, the dataset is split into ‘k’ folds and the model is trained on ‘k-1’ folds and tested on the remaining one fold. The procedure is repeated for each fold and the average of performance metrics are calculated. The performance metrics used are MAE and Standard Deviation (SD) which is necessary to have compliance with clinical standards. The mathematical formula for both is given by

$$MAE=\frac{1}{n}\sum_{k=1}^{n}\left|{y}_{k}-{\widehat{ y}}_{k}\right|$$
(14)

where n = number of observations; \({y}_{k}\) = target value; \({\widehat{y}}_{k}\) = predicted value.

$$SD=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{({x}_{i}-\mu )}^{2}}$$
(15)

where n = number of observations.

Algorithm 7
figure g

Random forest regression for BP prediction with k-fold cross-validation

The number of estimators is set to 100. As discussed, the overfitting is reduced and prediction accuracy is improved by this ensemble model of decision trees. Thus, generates an effective model that captures a variety of data properties by combining the outputs of several trees. Also, the k value is set to 10, thereby dividing the dataset into ten subsets, maintaining ninefold for training and the remaining one for testing as part of a tenfold validation technique. Thus, the model that has been thoroughly evaluated using AAMI and BHS standard (Algorithm 8) using Random Forest with 100 estimators and tenfold validation is more robust and dependable.

Algorithm 8
figure h

Evaluation metrics for BP prediction (AAMI and BHS Standards)

Hardware and software specifications

The hardware and software specifications used in these experiments are displayed in Table 3.

Table 3 Hardware, software, and model training configuration.

Results and discussion

The proposed models, utilizing random forest with 100 estimators, are employed to estimate blood pressure at the systolic and diastolic levels, and their results are compared. The study’s findings are presented in Table 4, which offers a thorough summary of the accuracy obtained by each transfer learning model and demonstrates how successful they are as feature extractors for precise systolic blood pressure (SBP) and diastolic blood pressure (DBP) measurement.

Table 4 Performance of transfer learning models with random forest in estimating SBP and DBP.

The results must satisfy the criteria set by AAMI clinical78 and BHS standards79. As per AAMI standard, the mean difference between a device and mercury standard should lie within 5 mm Hg and not exceed a SD of 8 mm Hg. The BHS’s objectives align with those of the AAMI standard. Table 5 lists the BHS classifications.

Table 5 British hypertension society classifications79.

The BHS clinical standard states the validity of blood pressure measurement devices or algorithms based on the percentage of BP readings that fall within predefined error threshold when compared to a reference standard. The grading criteria are as follows:

  • Grade A: It requires ≥ 60% of BP samples to be within ± 5 mmHg, ≥ 85% within ± 10 mmHg, and ≥ 95% within ± 15 mmHg.

  • Grade B: It requires ≥ 50% of BP samples to be within ± 5 mmHg, ≥ 75% within ± 10 mmHg, and ≥ 90% within ± 15 mmHg.

  • Grade C: It requires ≥ 40% of BP samples to be within ± 5 mmHg, ≥ 65% within ± 10 mmHg, and ≥ 85% within ± 15 mmHg.

Grade A corresponds to the highest level of accuracy, followed by grade B and then grade C.

The achieved results (Figs. 7 and 8) illustrate the efficacy of various DL models in estimating BP using photoplethysmography (PPG) signals. Notably, the ConvNeXtTiny model and VGG16 exhibit the best performance, with MAE of 2.95 mmHg and 3.94 mmHg, respectively, for SBP and 1.66 mmHg and 2.56 mmHg, respectively, for DBP. These values fall well within the clinical standard of acceptable error. Additionally, both models maintain a relatively low standard deviation of 4.11 mmHg and 5.19 mmHg for SBP and 2.6 mmHg and 3.8 mmHg for DBP. This further confirms its reliability in terms of standard deviation. Even though the result achieved for ConvNeXtTiny and VGG16 had minimum error but the result obtained using the other three models also indicated good performance and compliance with the AAMI standard for DBP.

Fig. 7
figure 7

Comparison of model’s mean absolute error (MAE).

Fig. 8
figure 8

Comparison of models’ standard deviation (SD).

Considering the BHS criteria for grade A, where results should be greater than 60%, 85%, and 95%, the models’ performance can be assessed as follows:

  • For Systolic Blood Pressure (SBP):

  • ConvNeXtTiny meets the criteria A (Fig. 9) for all error thresholds, showcasing robust performance with high accuracy percentage (81.33%, 97.33% and 96.33% of the input data falls within the error of < 5 mmHg, within < 10 mmHg, and within < 15 mmHg respectively). And VGG16 also meets the criteria A with 68%, 96.67% and 98.67% of the input data that falls within the specified error.

  • Albeit the performance of ResNet50, InceptionV3, NASNetLarge, and InceptioResNetV2 aligns closely with grade A criteria for all error thresholds, it can be categorized as meeting grade B criteria (Figure 9) due to its proximity to the established benchmarks.

  • For Diastolic Blood Pressure (DBP):

  • All models perform well within grade A criteria (Fig. 10) for all error thresholds, with most of their estimates falling within the defined ranges.

Fig. 9
figure 9

Comparative performance of the Models’ SBP error as per British Hypertension Society (BHS).

Fig. 10
figure 10

Comparative analysis of the Models’ DBP error as per British Hypertension Society (BHS).

Thus, the results show that the features extracted from PPG signals by the pretrained CNN models capture physiologically relevant information that correlates with blood pressure variations. The PPG signal reflects change in blood volume as heart pumps, which in turn is exhibited in their time–frequency representation. The scalograms obtained using continuous wavelet transform capture the dynamic variations of physiological data such as the systolic correlates with the rising edge and diastolic blood pressure with dicrotic notch. The DL models are capable of identifying intricate patterns in these time–frequency representation, which are challenging to capture manually. This complies with the clinical standard as discussed earlier.

Another important statistical metric used to quantify the linear relationship between two variables is the correlation coefficient (Pearson) which is denoted by ‘r’. Its value lies between −1 and 1. The value ‘−1’ indicates perfect negative linear relationship, ‘ + 1’ indicates perfect positive linear relationship, and no linear relationship is indicated by −0. The correlation coefficient between two variables X and Y is calculated by using following formula:

$$r=\frac{\sum ({X}_{i}-\overline{X })({Y}_{i}-\overline{Y })}{\sqrt{{\left({X}_{i}-\overline{X }\right)}^{2}{\left({Y}_{i}-\overline{Y }\right)}^{2}}}$$

where \({X}_{i}\) and \({Y}_{i}\) are the individual records, \(\overline{X }\) and \(\overline{Y }\) are the mean of X and Y respectively.

In context of blood pressure estimation, Pearson correlation coefficient can be used to find out how close the estimated SBP and DBP values are with respect to target SBP and DBP. It provides a more complete picture of the result reliability. Table 6 shows that ConvNeXtTiny and VGG16 has greater correlation coefficient around 0.8 for both SBP and DBP as compare to other four models.

Table 6 Pearson correlation coefficient.

In summary, the presented results indicate that the models, particularly ConvNeXtTiny and VGG16, largely satisfies AAMI standard in BP estimation. Both achieved grade A for both SBP and DBP estimation, thereby meeting the BHS standard with good correlation. The superior performance is due to the fact that it has a consistent architecture with uniform filter and max-pooling dimensions. The other four models met the BHS clinical standard for DBP estimation with grade A and approached grade A for SBP estimation, demonstrating their effectiveness and proximity to meeting the BHS standard.

These findings highlight the potential of using pretrained models, such as VGG16, ResNet50, InceptionV3, NASNetLarge, InceptionResNetV2 and ConvNeXtTiny, as reliable tools for accurate BP estimation, satisfying the standards established by both AAMI and BHS in clinical blood pressure monitoring.

Benchmarking against established algorithms

In this analysis, an evaluation of the proposed method has been presented alongside several well-established approaches in the clinical field for BP estimation using physiological signals. The various algorithms were conducted on the MIMIC II and MIMIC III databases, employing various types of input features and machine learning models for comparison (Table 7).

Table 7 Performance benchmarking of blood pressure estimation models against established algorithms.

According to the research papers, upon comparing their results, it has been discerned that the findings present a challenging perspective. Specifically, the obtained results demonstrate a notable advancement in the medical sciences field, specifically in the estimation of blood pressure values as per clinical standards.

This method grasps scalogram-based features and employed six pretrained models for prediction. The results demonstrate promising performance for ConvNeXtTiny and VGG-16 with mean absolute errors (MAE) in compliance with AAMI standard, underlining its potential for accurate blood pressure estimation. Furthermore, this approach meets BHS criterion with grade A, attesting to its clinical validity. For remaining four models, satisfying AAMI standard but achieved grade A BHS standard for DBP and grade B BHS standard for SBP. A comparative analysis was conducted exclusively with studies employing deep learning models, as this approach incorporates deep features in conjunction with deep learning architectures.

The primary novelty of the proposed model is highlighted in the study, which focuses on the application of the Morlet wavelet transform which results in so called scalograms and its distinctive features are highlighted below:

  1. (a)

    Multiresolution analysis:

    The model captures both localized and global fluctuations in the time-frequency domain because of the multiresolution analysis of the PPG signals provided by the Morlet wavelet transform. Comparing this method to conventional time-domain techniques yields a more thorough description of the underlying physiological dynamics.

  2. (b)

    Frequency localization:

    Morlet wavelet transform is very useful for evaluating non-stationary signals such as PPG, because it offers superior frequency localization compared to Fourier-based techniques. Thus, the model filter out unnecessary information and noise while focusing on specific frequency components that are relevant to blood pressure dynamics.

  3. (c)

    Adaptive representation:

    In situations where the signal contains complex patterns as in case of PPG signals, scalograms provide a more reliable depiction of such PPG signals. The Morlet wavelet technique gives the PPG signals an adaptable representation, thus the model can capture changes in frequency content over time. Its adaptability is crucial for precisely characterizing the dynamic nature of blood pressure fluctuations since blood pressure changes can vary significantly in response to physiological and environmental stimuli.

  4. (d)

    Integration of pre-trained models with scalogram:

    By using the Morlet wavelet transform in conjunction with pre-trained models for feature extraction, the model’s ability to extract discriminative representations from the scalograms is enhanced. Due to this integration, the model performs better on blood pressure estimation tasks by utilizing the benefits of both deep learning expressiveness and wavelet-based signal processing.

Integrating Morlet wavelet pre-processing into blood pressure estimation models can significantly enhance their performance. This blood pressure estimation method, that uses a Morlet wavelet preprocessing based scalograms of raw PPG signals and it complies with clinical standards. Although PPG signals and derivatives were included in the paper36,37,43, our focus on utilizing scalograms emphasizes the effectiveness of Morlet wavelet preprocessing in achieving challenging results through a simplified input method. The Morlet wavelet transform enhances the modeling process by capturing the nonlinear interactions present in physiological data, thereby improving the accuracy and robustness of BP estimation.

Overall, the proposed method exhibits competitive performance in comparison to prior works, demonstrating its potential as a reliable tool for blood pressure estimation in clinical applications by achieving results comparable to those of the previous studies listed in the above table. This thorough assessment serves as a baseline for further studies in this area.

The outcomes of the aforementioned implementation are as follows:

  1. (a)

    Scalogram-based transfer learning: The proposed algorithm presents a novel method for accurate blood pressure measurement from PPG data by combining transfer learning with scalogram-based preprocessing.

  2. (b)

    Data-driven solution: This research provides a simplified, data-driven methodology for continuous blood pressure monitoring, increasing efficiency and reliability by doing away with human feature engineering.

  3. (c)

    Model evaluation and selection: The study presents a methodical comparison of several deep learning models and shows that the most accurate blood pressure predictions are obtained when random forest regression is employed in conjunction with the ConvNeXtTiny and VGG16 models.

  4. (d)

    Standards compliance: The performance of the suggested approach is carefully assessed in accordance with accepted standards (AAMI and BHS), demonstrating its dependability for practical blood pressure monitoring applications.

  5. (e)

    Pearson connection Coefficient: For both ConvNextTiny and VGG16, a significant correlation was found between the estimated and true BP values.

Conclusions

In this study, a reliable and clinically validated approach for PPG based non-invasive blood pressure estimation using transfer learning is proposed. The proposed method, being a unique pathway for blood pressure estimation, uses transfer learning framework integrating deep features obtained from scalograms with Random Forest regression. The pretrained CNN models, such as VGG16, ResNet50, InceptionV3, NASNetLarge, InceptionResNetV2, and ConvNeXtTiny which are demonstrated to be effective in the study as feature extractors for estimating SBP and DBP from photoplethysmography (PPG) signals. The findings demonstrated the six models’ potential for accurate blood pressure measurement in clinical settings by showing that they all correlated well and met the AAMI’s accuracy standards for SBP and DBP estimation.

Additionally, the models performed very well when satisfying the BHS clinical criteria; ConvNeXtTiny and VGG16, for example, achieved grade A results for both SBP and DBP estimate. The other models approached grade A for SBP estimation and successfully satisfied the BHS criteria for DBP estimation. These findings suggest that pretrained models have the potential to enhance blood pressure monitoring accuracy, contributing to improved healthcare decision-making.

For future scope, further investigations can focus on expanding the analysis to include a larger dataset encompassing diverse patient populations. But, in handling larger dataset, the application of Morlet wavelet and pre-trained CNN models may require scalable computational resources, effective GPU acceleration, and optimised data pretreatment for computational efficiency, with any implementation issues taken into account. In additional to the this, exploring different CNN architectures and evaluating their performance on specific subgroups or medical conditions could provide valuable insights.

One potential area could be the evaluation of the impact of model performance at different frequency ranges in PPG signals, which provides an opportunity to increase the depth of the analysis. Incorporating other physiological signals and exploring multimodal approaches may also enhance the accuracy of blood pressure estimation. Finally, conducting real-time experiments and deploying the developed models in clinical environments would offer practical validation and help bridge the gap between research and clinical application. However, the implementation of this future scope could present challenges, such as the need to optimise for real-time processing constraints, handle legal and ethical issues about patient privacy, and ensure robust adaptability to various clinical data. These challenges emphasise the necessity of implementing a comprehensive plan to successfully integrate trained models into changing healthcare environments.