Introduction

In recent times, the utilization of wavelet Transform (WA) has extended to diverse applications, encompassing machinery fault detection1,2, speech processing3,4,5, object localization6, and biomedical signal processing7,8. WA decomposes signals into individual frequency components. The analyzer provides improved resolution for each component as the measurement device (scale) gets adjusted. The analysis proves that functions can be decomposed into basic elements through wavelet transformation with tuning and movement operations. Through discrete wavelet transforms (DWT) signal decomposition creates two distinguishable signal fragments as low-frequency approximations and high-frequency details9.

Recent applications of DWT include its role in refining speech signals. Talbi Mourad4 introduced a speech enhancement method based on wavelet transform, comparing it against established techniques like Wiener filtering and the maximum a posteriori estimator of the magnitude-squared spectrum (MSS-MAP) in the frequency domain. Sonia et al.10 innovated a method for smoothing voice signals before soft thresholding, thereby enhancing the signal-to-noise ratio (SNR) of speech signals. Sanam and Shahnaz proposed a statistically adaptive hard threshold function for the wavelet method in speech enhancement11, demonstrating its efficacy in reducing both white and colour noise.

The pivotal role of mother wavelet selection in WT is underscored, with various types offering distinctive features. The choice of a specific mother wavelet involves both qualitative and quantitative approaches. Fu et al.12 employed a biorthogonal 6.8 mother wavelet for surface profile segregation based on symmetry properties. A. Mojsilovic et al. considered regularity and vanishing moments, opting for biorthogonal wavelets in texture characterization13. Safavian et al. found that b-spline, coiflet, and db4 demonstrated equivalent efficacy in identifying power system transients14. Quantitative techniques have emerged in recent years to ascertain the similarities between mother wavelet functions and acquired signals. N. Saito introduced the minimum description length (MDL) method15 to identify the most appropriate mother wavelet function. This method is based on the principle of the optimal model. Khan et al.16 employed the MDL to select the mother wavelet function for analyzing a three-phase interior permanent motor, favouring the db3 mother wavelet.

Jesmin Khan17 developed a modified MDL approach to denoise smart grid data, while Moradi18 looked into different mother wavelet functions for ocean color time series and concluded that the Daubechies wavelet family is the best. The Maximum Cross-Correlation Coefficient (MCC) method can determine similarities between acquired signals and mother wavelets. B.N. Singh et al.19 used the MCC method to select the mother wavelet of the ECG signal and chose the db8 mother wavelet for ECG signal denoising. Marxim and Mohanty6 selected the sym7 mother wavelet for underwater acoustic signals using the MCC method. Finally, Gwinn et al.20 evaluated various mother waves with the MCC method for power side-channel analysis and supported the use of the mexh mother wavelet for time series evaluation.

H. Nematallah and S. Rajan conducted research about selecting mother wavelets for improving human activity recognition through wearable sensor measurements21. The research identifies activities by analyzing their unique signal patterns with accuracy as its main goal. The research design applies wavelet packet transform and energy analysis through two classification techniques. The research utilizes different wavelet families on multiple datasets to establish which wavelet selection leads to superior recognition results. The research demonstrates that using Coiflet family wavelets results in superior identification capabilities for different activities through accelerometer data analysis. When used in this scenario the Haar wavelet showed unsuccessful results. Acoustic emission techniques played a crucial role in diagnosing insulation problems through the examination of partial discharge signals as per S.R. Vippala et al.22. The selection of proper wavelets represents a fundamental requirement for analysis because experts identified distinct wavelets for reconstruction and denoising and compression applications. A total of thirty-six wavelets were evaluated using five test signals which identified coif3, coif4 and coif5 as the most suitable options for the three performance tasks according to their energy criteria.

The widespread use of Multi-criteria decision-making (MCDM) methods exists to solve diverse decision challenges which include multiple competing objectives found throughout different fields of study. Such challenges consist of several options that are evaluated through various assessment standards. The selection process bases its decisions solely on the preferences of decision-makers. Petković et al.23 used WASPAS and COPRAS MCDM methods to evaluate the most efficient processing methods for non-conventional ceramic material machining. WASPAS employs multi-level utility functions including additive and multiplicative parts for decision making and COPRAS provides excellent alternative ranking through assessment of both criterion importance and utility values. The authors used a Taguchi-based COPRAS method to improve turning parameters for stainless steel 304. The method proved capable of converting multiple optimization objectives into a structured single-objective evaluation process.

Jahan et al.24 studied five normalization techniques that helped determine the best approach for COPRAS performance evaluation. The scientists applied this methodology to enhance drilling operations of aluminum alloys when using solid carbide drill bits with high-pressure coolant for efficient parameter selection25. The authors developed an integrated approach between fuzzy AHP and fuzzy COPRAS to select machine tools by eliminating consistency ratio requirements and utilizing fuzzy linguistic terms to enhance decision quality26. A stochastic COPRAS model enabled successful cargo service provider selection in the industry by solving problems caused by unpredictable performance metrics through probabilistic decision processes27. The flexible nature of COPRAS enables practitioners to use it effectively for managing situations involving both certain and unreliable decision-making contexts.

The highlights of the research work are listed below,

  • Utilization of Wavelet Analysis in Addressing Speech Communication Challenges: This study investigates the application of wavelet analysis to examine speech signals facing challenges, particularly in scenarios involving face shields and face masks during the COVID-19 pandemic.

  • Addressing Challenges in Selecting the Optimal Mother Wavelet: This work delves into the challenge of selecting the appropriate mother wavelet function for speech. Also, This research adopts Maximum Cross-Correlation Coefficient (MCC) method along with Maximum Energy to Shannon Ratio (MEER) criterion as solutions to reduce variations in results from different mother wavelets.

  • Practical Guidance for Mother Wavelet Selection in Speech Signal Processing: This study demonstrates methods for selecting proper mother wavelets in speech signal processing for different face mask and face shield situations, along with practical guidelines for mother wavelet selection. Both MCC and the MEER criteria were applied to determine the insights which resulted in the recommendations for mother wavelet selection. Furthermore, the COPRAS (COmplex PRoportional ASsessment) technique is employed to select the optimal mother wavelet for speech signals under various face mask and shield conditions.

The utilization of wavelet analysis in enhancing speech signals becomes particularly relevant in the context of face masks and face shields, crucial tools in mitigating the spread of the coronavirus pandemic. This research involves the acquisition of speech signals under diverse face mask and face shield conditions, detailed in Sect. “Method”. Section “Wavelet Transform” elaborates on WT, Sect. “Mother Wavelet Selection” outlines the mother wavelet selection via the maximum cross-correlation coefficient method, Sect. “Results and Discussions” presents results and discussions, and Sect. “Conclusion” concludes this research endeavor.

Method

In this work, the participants are made to read out the vowels and a passage. And four non-native English language-speaking human subjects are selected for the experimentation. Also, it is noted that the medium of instruction for these four subjects was English. Various conditions are applied to subjects while reading out the prescribed text, and these conditions are given in Table 1. Figure 1 shows a subject with no mask, surgical mask, cloth mask, twin masks (combination of a bottom surgical mask and a top cloth mask), and N95 mask. Subjects with the applied conditions are asked to read out a Grandfather Passage (GFP) and vowels. Subjects’ speech is recorded separately for each applied condition. Speech signals of 16 bit are recorded at a sampling frequency of 44 kHz on a laptop in wav format. Experiments are repeated three times for both vowels and GFP. And all the speech signals are recorded and utilized for analysis.

Table 1 Face mask conditions for experiments.
Fig. 1
figure 1

Experiment cases.

Ethical approval and consent to participate

This study adhered to all relevant guidelines and regulations. The Ethics Committee of Aditya University (Approval ID: AUS/Ethics/15) sought ethical approval. The committee determined that formal ethical approval was not required, as the study did not involve experiments on humans or the use of human tissue samples. Written informed consent was obtained from all participants for their involvement in the study provided written informed consent. Additionally, clear written consent was obtained from all participants for the publication of photographs containing facial identification in an online, open-access journal.

Wavelet transform

Wavelet analysis, a method of time–frequency analysis, is used to solve problems in engineering, physics, and mathematics. It analyses low-frequency signal components with a long-duration function and high-frequency signal components with a short-duration function. It greatly helps in the breakdown of a signal into multiple frequency (scale) components, which it examines by translating (positioning) them along the length of the signal and concurrently matching them with the original signal. As mentioned earlier, wavelet transforms (WT) can be classified into three different types: continuous wavelet transforms (CWT), discrete wavelet transforms (DWT), and wavelet packet transforms (WPT). The generalized Discrete Wavelet Transform (DWT) can be expressed as:

$$X\left[a,b\right]= \sum_{n= -\infty }^{\infty }x\left[n\right]{\varphi }_{a,b}[n]$$
(1)

where:

x[n]—the input signal;

ϕ[n]—a finite-length window function;

a—the dilation parameter;

b—the contraction parameter.

$${\varphi }_{a,b}\left[n\right]= \frac{1}{\sqrt{a}}\varphi \left[\frac{n-b}{a}\right]$$
(2)

The function \(\varphi \left[n\right]\) must meet two criteria to become the mother wavelet. Firstly, the total energy of \(\varphi [n]\) must be finite. Secondly, function \(\varphi [n]\) must adhere to the admissibility condition presented in the referenced document.

$$E= \sum_{n= -\infty }^{\infty }{\left|\varphi [n]\right|}^{2}<\infty$$
(3)
$$\gamma = \sum_{k= 0}^{\infty }\frac{{\left|\widetilde{\varphi }[k]\right|}^{2}}{k}<\infty$$
(4)

where,

\(\gamma\)—the admissibility constant.

\(\left|\widetilde{\varphi }[k]\right|\)- the discrete Fourier transform of \(\varphi [n]\) with k data points.

And,

$$\widetilde{\varphi }[k]= \sum_{n= -\infty }^{\infty }\varphi \left[n\right]{e}^{\frac{-2\pi kni}{k}}$$
(5)

By using two sets of functions, the WT can be decomposed into the acquired signal or function at different scales or resolutions. Those functions can be called scaling functions and wavelet functions. The wavelet decomposition process is shown in Fig. 2. The wavelet function is associated with the high-pass filter, and it is denoted by ‘H’ in Fig. 2. The scaling function, denoted by ‘L’ in Fig. 2, is associated with the low-pass filter. The acquired speech signal is used as an input discrete signal x[n], and it is passed through the low-pass and high-pass filters to obtain the approximation coefficient A1 and detail coefficient D1. In the subsequent level, A1 is processed through the low-pass and high-pass filters, resulting in the approximation coefficient A2 and the detail coefficient D2. Similarly, in the next level, A2 is processed through the low-pass and high-pass filters, yielding the coefficients A3 and D3. The process continues iteratively, where at each level n, the approximation coefficient An is further decomposed into a new approximation coefficient An+1 and a detail coefficient Dn+1. This hierarchical decomposition enables the extraction of both low-frequency (approximation) and high-frequency (detail) components at progressively finer scales. The number of decomposition levels depends on the signal length and application-specific criteria, such as energy retention, entropy threshold, or frequency resolution requirements. The final decomposition results in a structured representation of the signal, facilitating efficient analysis for applications like speech processing, feature extraction, and noise reduction.

Fig. 2
figure 2

Decomposition of wavelet transform.

Mother wavelet selection

Maximum cross-correlation coefficient

As mentioned earlier, the speakers are requested to read vowels and Grandfather Passage (GFP). And the speech signal is acquired by the microphone for various face masks and face shield conditions as explained in Sect. “Method”. The speech signals are nonlinear and nonstationary. A small portion of the speech signal is considered as the input speech signal, \({x}_{1}[n]\). In recent times, several mother wavelet families have been proposed, including the Symlet family, Daybechies family, and Coiflet family. Figure 3 shows the list of mother wavelet functions considered for this research work. In this Fig. 3, the Daubechies, Symlet, and Coiflet families of mother wavelets are specifically investigated. The Symlet family comprises Sym2, Sym3, Sym4, Sym5, Sym6, Sym7, Sym8, while the Daubechies family includes Haar, Db4, Db6, Db8, Db10, Db12, Db14, and Db16. The Coiflet family includes Coif1, Coif2, and Coif3. A total of 18 mother wavelet functions are considered. These mother wavelet functions are assigned as \({x}_{2}[n]\). The generalized cross correlation function between the functions \({x}_{1}[n]\) and \({x}_{2}[n]\) is calculated in the frequency domain.

$${R}_{{x}_{1}{x}_{2} }\left(\tau \right)=\underset{-\infty }{\overset{\infty }{\int }}{\psi }_{\text{1,2}}(f){X}_{1}(f){X}_{2}^{*}(f){e}^{i2\pi f\tau }df$$
(6)

where the complex conjugation is denoted by the superscript ‘*”. \({X}_{1}\left(f\right)\) and \({X}_{2}(f)\) are represent the Fourier transform of the \({x}_{1}[n]\) and \({x}_{2}\left[n\right]\) respectively. Also, \({\psi }_{\text{1,2}}\left(f\right)\) refers the weighting function and it is represented as,

Fig. 3
figure 3figure 3

Mother wavelet functions.

$${\psi }_{\text{1,2}}\left(f\right)= \frac{1}{\left|{X}_{1}(f){X}_{2}^{*}(f)\right|}$$
(7)

Substituting Eq. (7) in Eq. (6) and cross correlation function can be written as mathematically,

$${R}_{{x}_{1}{x}_{2} }\left(\tau \right)=\underset{-\infty }{\overset{\infty }{\int }}{\frac{{X}_{1}(f){X}_{2}^{*}(f)}{\left|{X}_{1}(f){X}_{2}^{*}(f)\right|}e}^{i2\pi f\tau }df$$
(8)

where \(\tau\) is the time delay.

Maximum Energy to Shannon Entropy Ratio Criterion (MEER)

AP Rodrigues et al. used the MEER method for the vibration signal in a machining process. Vibration signals acquired during the high-speed turning process are acquired and the mother wavelet is selected for those signals by using the MEER method. Every mother wavelet in this case had its maximum energy to Shannon entropy value measured. Also, the mother wavelet that produced the highest value of this ratio was chosen as the mother wavelet.

In this research work, the mother wavelet is selected for the speech signal by using the MEER method. In this MEER method, the higher energy ration of the maximum energy value to the acquired signal to entropy value of the dominant wavelet coefficient suggests the appropriate mother wavelet for DWT. The energy ratio can be written mathematically,

$${E}_{S}= \frac{{E}_{e}(s)}{{E}_{entropy}(s)}$$
(9)

where \({E}_{e}(s)\) the maximum energy value of the acquired speech signal and \({E}_{entropy}(s)\) is the entropy value of the dominant wavelet coefficient.

$${E}_{entropy}\left(s\right)= -\sum_{i=1}^{N}{p}_{i}.{log}_{2}{p}_{i}$$
(10)

where N is the number of wavelet coefficients, and pi is the energy probability distribution of the wavelet coefficients and is given by:

$${p}_{i}= \frac{{\left|wt(s,i)\right|}^{2}}{{E}_{e}(s)}$$
(11)

where \(wt(s,i)\) represents the wavelet coefficients, and the maximum energy value can be calculated by the equation

$${E}_{e}\left(s\right)= \sum_{i=1}^{N}{\left|wt(s,i)\right|}^{2}$$
(12)

COPRAS methodology

The assessment of suitable mother wavelets for speech enhancement under different face mask and shield conditions depends on combination of MCC Coefficient having high values and MEER maintaining low ratios. It becomes difficult to compare these metrics since data point numbers increase. The COPRAS method provides a solution for handling this intricate decision-making process because it functions as an extensive Multiple Criteria Decision Making technique. Zavadskas et al.28 established COPRAS which conducts systematic alternative evaluations through criterion weighting. The ranking procedure enabled by this approach combines evaluation of beneficial attributes with non-beneficial attributes to achieve overall utility rating for each alternative. The current study utilizes COPRAS to unite MCCC and MEER criteria for establishing an objective ranking method to select the optimal mother wavelet. COPRAS proves appropriate for structural decision applications in engineering because of its easy implementation and systematic workflow29.

The method delivers excellent quantitative results yet researchers have observed that using it for qualitative parameter assessment becomes more complex30. The method correctly manages criteria weightage to identify the mother wavelet which provides optimized signal similarity and energy efficiency. The methodology for ranking procedures using COPRAS appears in Fig. 4 through its complete flowchart representation. The designed system provides an optimized selection process which delivers a reliable and unbiased method to choose mother wavelet functions suitable for speech enhancement needs.

Fig. 4
figure 4

The methodological steps involved in the COPRAS ranking process.

Results and discussions

During the experiment, four participants underwent a series of three repetitions for both the vowel and GFP passages, with their speech signals captured and analyzed during each iteration. This process took place within a controlled classroom environment, where all participants were tasked with reciting the Vowel and GFP while donning ten different combinations of face masks and face shields. Speech signals in the recordings were influenced by three main factors: the differences in speakers’ voices, the speech content, and the use of various types of masks and face shields. Wavelet decomposition was altered as distortions from pitch variations affected its spectral patterns. Variations in speech energy impacted the signal amplitude, which diminished the clarity of the message. The analysis experienced changes due to articulation differences, leading to modifications in the waveform. The use of face masks alongside shields changed the manner in which speech produced resonance and frequency characteristics, affecting the overall quality of the speech. The speech signals were recorded at a sampling rate of 44 kHz, allowing for detailed analysis of reverberation time across a wide spectrum of frequencies spanning from 250 to 8000 Hz. The examination of these frequencies revealed a consistent trend: as the frequency increased, the reverberation time decreased accordingly. For instance, the reverberation time measured 0.952 s at 250 Hz, decreasing to 0.642 s at 4000 Hz3134.

The individuals performed each of the vowel and GFP passage recitations three times while speech signals obtained by microphone. The choice of mother wavelet functions greatly affects the outcome of wavelet analysis. This research tries to determine the most suitable mother wavelet function that operates on speech voice samples recorded from subject wearing face masks. The analysis takes place by applying two different methods which are MCC and MEER. The MCC method selects a particular portion of speech data which becomes the fundamental input for analysis. Various mother wavelet functions are applied to a specific segment of signal for the evaluation of signal similarity through the cross-correlation function. The method is rigorously applied three times during evaluations of both vowel and GFP passages to obtain reliable results.

MATLAB R2015a operated the implementation and evaluation process on a MacBook Air machine with 8 GB RAM and an Apple M1 processor. The computational tests measured the duration needed by methods during their analytical procedures. The execution time for MCC Coefficient amounted to 1.92 s while MEER required only 0.86 s to complete its analysis. The study confirms that MEER delivers superior performance than MCC for selecting wavelets in speech signal processing operations.

Figures 5(a), 6(a), 7(a), and 8(a) present the findings of the average MCC for the vowel speech signal across various types of mother wavelets for speakers one to four, respectively. Likewise, Figs. 5(b), 6(b), 7(b), and 8(b) illustrate the MCC results for the GFP speech signal across different mother wavelets for speakers one to four, respectively. It compares various.

Fig. 5
figure 5

Speaker 1: Result of MCC.

Fig. 6
figure 6

Speaker 2: Result of MCC.

Fig. 7
figure 7

Speaker 3: Result of MCC.

Fig. 8
figure 8

Speaker 4: Result of MCC.

From Fig. 5(a), it is observed that the Daubechies family mother wavelet Db16 consistently exhibits the highest cross-correlation coefficient among all ten cases. Following closely behind Db16, the Symlet mother wavelets sym2 and sym3, along with other Daubechies family mother wavelets, demonstrate the next highest values. It is worth noting that the quality of the wavelet transform is directly proportional to the magnitude of the cross-correlation coefficient. Based on the results obtained from the cross-correlation function analysis of speaker one’s vowel speech signal, ‘Db16’ is recommended as the optimal choice.

Our research delves into the relationship between cross-correlation functions (CCFs) and the selection of mother wavelets. The study involves ten distinct experimental cases, each representing a specific scenario where wavelet analysis is applied. Across these cases, we compute the CCF values for various mother wavelet functions, including Haar, Daubechies, Symlet, and Coiflet wavelets. The central objective is to guide practitioners in making informed decisions regarding wavelet selection for applications such as signal denoising, feature extraction, and data compression.

From Fig. 5 (b), it is observed that the Db16 mother wavelet function has the highest cross correlation compared with other wavelet functions for all ten cases. Next to the Db16 mother wavelet, Symlet mother wavelet sym2, sym3 and other Daubechies family mother wavelets get the next higher values. ‘Db16’ is advised based on the results of the speaker’s one GFP speech signal cross correlation function.

From Fig. 6(a), 7(a), and 8(a), similarly with speaker one result, the ‘Db16’ mother wavelet function has the highest cross correlation compared with other wavelet functions for all ten cases. Also, from Fig. 6(b), 7(b), and 8(a), the ‘D16’ mother wavelet function has the highest cross-correlation compared with other wavelet functions for all ten cases.

Fig. 9(a), 10(a), 11(a), and 12(a) show the results of the maximum energy to Shannon ratio (MEER) for vowel speech signal of speaker one to four and various mother wavelets. Also, those figures show the result of various experimental cases. Similarly, Fig. 9(b), 10(b), 11(b), and 12(b) show the results of the maximum energy to Shannon ratio (MEER) for GFP speech signal of speaker one to four and various mother wavelets.In Fig. 9 (a), it is observed that the Daubechies family mother wavelet Db16 has a highest energy ratio than other mother wavelet functions for all ten cases. The quality of the wavelet transform is directly proportional to the magnitude of the energy ratio. ‘Db16’ is advised based on the results of the speaker one vowel speech signal cross correlation function.

Fig. 9
figure 9

Speaker 1: Result of MEER.

Fig. 10
figure 10

Speaker 2: Result of MEER.

Fig. 11
figure 11

Speaker 3: Result of MEER.

Fig. 12
figure 12

Speaker 4: Result of MEER.

More over the other results of MEER method are very much similar with the MCC method. In the MCC method, selection of sample speech signal take place an important role in deciding the cross-correlation coefficient. But in the MEER method, the entire speech signal is considered for the estimation of energy ratio. Also, the computation of MEER method is low. From the results, it is observed that the different types of face masks do not influence the selection of the mother wavelet function. Also, it is observed that the face shield also does not influence the selection of the mother wavelet function. For the speech signal Daubechies family mother wavelet, ‘Db16’ is recommended.

Zavadskas et. al., created COPRAS as an approach that examines the direct and proportional evaluation of conflicting parameters between alternatives. COPRAS defines success evaluation by performing a multi-step ranking procedure which combines multiple criteria with their assigned weights. The copras method helps select the appropriate option among available alternatives. The method demonstrates maximum effectiveness at resolving dynamic engineering decision problems. The simplicity and easy operation of COPRAS becomes a limitation when the tool must deal with qualitative evaluation criteria29,32,33.

Understanding the proposed mother wavelet for speech signals under different face masks and shield conditions response allows for creating efficient plans that boost efficiency while minimizing expenses. Table 2 contains the results of mother wavelet ranking by COPRAS method by considering MCC and MEER methods. The optimal sequence for running the experiments according to the computed rankings is Db16 > Db14 > Db12 > Db10 > Sym2 > Db8 > Sym3 > Db6 > Sym4 > Db4 > Sym5 > Haar > Sym6 > Sym7 > Sym8 > Coif3 > Coif1 > Coif2. This ranking reveals that run Db 16 is the most effective alternative, achieving a 16 utility score (Lowest), whereas run Coif 2 is the least favourable with a 259 utility degree (highest) (Table). The best combination for minimizing response variations was observed in run Db16, followed by runs Db14 and Db12 in both methods. The optimal mother wavelet was identified in the Db16 signal in the MEER and MCC method, as well as GFP and Vowels. The proposed sequence helps improve both signal efficiency and component quality to the researcher should adopt it. The chosen computational techniques in this method operate straightforwardly while efficiently supporting the evaluation of alternatives and the selection of optimal mother wavelets.

Table 2 Results of COPRAS mother wavelet ranking.

Conclusion

The vowel and the grandfather passage paragraph (GFP) are read by the four readers. And the speech signal is captured for various face masks and face shield conditions by a microphone. For the acquired speech signal, the appropriate mother wavelet is selected by the maximum cross-correlation coefficient (MCC) method and the maximum energy to Shannon ratio criterion (MEER) method. From the results of both methods, the ‘Db16’ mother wavelet is recommended for the speech signal with various face mask and face shield conditions. The results also prove that the MCC method and MEER method are suitable for speech signals. Also, it is observed that MEER method is more effective than MCC method in computational.