Introduction

Various medical imaging techniques are used to evaluate the anatomical structures of blood vessels and the presence or absence, progression, and location of lesions. Among these techniques, digital subtraction angiography (DSA) is used as the standard imaging technique for the acquisition of images with high spatial resolution over a wide range and the analysis of information such as the movement of blood vessels and arterial blood1,2. However, DSA requires an invasive surgical procedure for image acquisition as well as the injection of a high-concentration contrast agent into local blood vessels, which may deteriorate tissue function and cause fatal neurological side effects3,4,5. Moreover, the two-dimensional nature of DSA images can lead to different stenosis results and location information of blood lesions depending on the imaging angle, thereby reducing the accuracy and reproducibility of diagnosis6.

Computed tomography angiography (CTA) techniques capable of three-dimensional angiography have been suggested to overcome these limitations. CTA is a minimally invasive method that can reduce the side effects of surgical procedures and contrast media, including high spatial resolution. Hence, it is widely used in clinical practice as an alternative to DSA imaging7,8,9. However, contrasting cerebral arteries (CAs), such as the vertebral artery and internal carotid artery (ICA), are located close to the vertebra and have similar signals on CTA imaging. As a result, the signals of the CA and bone may overlap, which may result in inaccurate location information of the blood vessels and stenosis (Fig. 1)10,11. Various CA segmentation techniques have been proposed to address these problems.

Fig. 1
figure 1

Illustration of cerebral arteries (CAs) in CTA imaging: (a) Axial view highlighting the internal carotid artery (ICA) and vertebral artery (VA), and (b) 3D reconstruction image the ICA and VA in relation to the surrounding cervical vertebrae.

Manual thresholding methods require the user to directly set the start-end point or seed point and use contrasting blood vessel signals to analyze the boundary, density, and lumen information of the blood vessel for segmentation. These methods are associated with high segmentation performance and accuracy by providing initial set values12,13. However, the initial signal information must be set for all images, which is an inefficient process for large-scale data. Additionally, segmentation performance may vary depending on the skill level of the user, reducing data reproducibility14,15,16. In contrast, automatic thresholding methods automatically estimate and input initial set values, such as start-end points or seed points, through a series of calculation processes and have high reproducibility and fast processing time. However, these methods are associated with signal interruption owing to stenosis, signal enhancement owing to calcification, and reduced segmentation performance in malformed structures such as aneurysms. In automatic thresholding methods based on signals, regions where the image signals overlap can be generated17,18,19. In such cases, the lack of an appropriate threshold value leads to failure to remove bone signals in the segmentation process.

As thresholding-based methods have clear limitations, new techniques are required for CA segmentation. Among these techniques, CTA imaging, which separates bones and blood vessels by subtracting a non-contrast CT image from a CTA image, has been developed. Compared with conventional CTA imaging, subtraction CTA (sCTA) can depict small blood vessels and aneurysms with increased accuracy and is suitable for the analysis of stenosis and plaque progression10. Furthermore, compared to DSA imaging, which is the standard imaging technique for angiography, sCTA has various benefits such as non-invasive procedures, short examination time, and low cost. The development of detectors and image processing technology has further improved the spatial resolution of sCTA to extract results similar to those of DSA in the detection and classification of aneurysms. sCTA is a versatile technique that can be used for prognostic observation of procedures such as surgical clips, intravascular stents, and coils20. However, sCTA involves a 2-scan method, which increases the radiation dose to patients and is associated with the potential risk of radiation-related side effects21,22.

The development of a CA segmentation technique that performs similarly to the sCTA technique using only CTA images can improve the diagnostic accuracy and solve the problem of overexposure in patients. DL models, which are actively applied in medical imaging, provide a solution. DL models have demonstrated excellent performance in tissue segmentation, including image transformation, to improve the quality of medical images and achieve specific purposes. In particular, the U-Net structure has shown improved performance in medical image processing, and its relatively simple structure and small number of parameters allow for convenient optimization in various situations. These advantages indicate that DL algorithms based on the U-Net structure can efficiently segment CA vessels, including converting CTA images to sCTA23,24.

On the contrary, efforts are currently being made to apply low-dose protocols to minimize the dose administered to patients when acquiring radiological images, including CTA. However, the application of low-dose protocols that enhance noise signals provide inaccurate information regarding lesions and tissues. Noise degrades the performance of segmentation techniques by distorting the luminal and edge signals of irregular blood vessels and blurring. Various noise filters have been used to solve noise problems. In particular, non-local means (NLM) are known to selectively remove noise while minimizing the loss of the edge signal25. Considering that these advantages can be useful for segmentation by emphasizing the signals and enhancement features of specific tissues, NLM can be used as a pre-processing methods to improve the performance of DL models26,27.

In addition, improving the performance of DL model, various pre-processing methods can be applied to enhance the tissue signal for segmentation28,29. The CA segmentation performance of the U-Net-based DL model can be improved using the acquired dataset with an enhanced signal and a semiautomatic thresholding method that combines the segmentation accuracy of manual methods with reproducibility and faster processing speed.

Thus, the purpose of this study was to acquire low-dose CTA images and apply various pre-processing methods to construct a suitable dataset. Moreover, we trained a U-Net-based DL model using the constructed dataset to achieve a performance similar to that of sCTA images for CA segmentation.

Results

To analyze the performance of the U-Net-based CA segmentation model, which varies according to the parameters of the pre-processing methods applied for dataset construction, the smoothing factor of the NLM algorithm was previously optimized. As shown in Fig. 2, the CTA images were acquired after applying the NLM algorithm with various smoothing factors under low-dose conditions. In Fig. 2a, the ROIs are shown as red and blue boxes for noise evaluation, and the white box indicate the enlarged area for visual evaluation. As the smoothing factor of the NLM algorithm increases visually, the blurring effect increases, lowering the image characteristics. Figure 3 shows a graph of the noise level factors in the CT images obtained under low-dose conditions according to the changes in the smoothing factor of the NLM algorithm. Both COV and CNR rapidly improved as the smoothing factor increased to 0.20 and remained constant as the smoothing factor increased further. In particular, the COVs for smoothing factors of 0.05 and 0.20 were significantly different at approximately 0.292 and 0.082, respectively. However, when the smoothing factor was 1.00, almost the same value of 0.084 was obtained. Additionally, when the smoothing factor was set to 0.05 and 0.20, the CNRs were measured to be 1.34 and 6.00, respectively. However, the CNR showed no significant change when the smoothing factor was increased from 0.20 to 1.00 (CNR = 5.89). Thus, 0.20 was set as the optimal value for the smoothing factor.

Fig. 2
figure 2

Filtered CT images by applying the NLM algorithm with various smoothing factor: (a) 0.01, (b) 0.05, (c) 0.20, (d) 0.50, and (e) 1.00. Box A and B are ROIs for quantitative evaluation, and the white box is the ROI for the magnified image at the top left.

Fig. 3
figure 3

Results of coefficient of variation (COV) and contrast-to-noise ratio (CNR) for filtered CT images by applying with various smoothing factor.

Considering the OS factor of the NLM algorithm, LS (d = 0.05), OS (d = 0.20), and HS (d = 1.00) images were obtained by applying the NLM algorithm to the respective smoothing factor values set from the CTA images under low-dose conditions. We intended to improve the training efficiency of the U-Net-based CA segmentation model by obtaining the predicted CA mask using the RG method for the CTA images denoised using each NLM algorithm. The predicted CA mask was added to the acquired LS and OS images and obtained as an enhanced LS (eLS) and OS (eOS) images. However, HS images showed low contrast of blood vessels and soft tissues and could not be refined using RG methods. Thus, the HS images were excluded from the weighted input data construction. As a result, the input data for training the U-Net-based CA segmentation model consisted of seven pairs of data: original (normal-dose condition), noisy (low-dose condition), LS, OS, HS, eLS, and eOS (Fig. 4).

Fig. 4
figure 4

(a) Contrasted CTA image with the bone signal removed by applying a threshold and (b) CTA image with the bone signal removed. In addition, (c) only the CA image was segmented by applying the RG technique to the contrast signal, and (d) the roundness CA signal was obtained by applying post-processing techniques. (e) The CA segmentation process was applied for all slices. In addition, the dotted red boxes in (c) and (d) are the ROIs for the magnified images at the top left.

Figure 5 shows the F1-score, AP, and IoU for the U-Net-based CA segmentation models with various datasets. When the confidence level was set to 0.5, the F1-score was highest in the eOS model with a score of 0.880, followed by the original and OS models with scores of 0.871 and 0.824, respectively. In particular, the LS and HS models with inappropriate smoothing strengths had scores of 0.693 and 0.748, respectively, and showed a lower CA segmentation performance than the noisy model (score = 0.818). However, the OS model showed a similar or slightly higher performance than the noisy model. However, the OS model with the denoising algorithm alone was unable to achieve better CA segmentation performance than the original model. The eOS model with the predicted CA masks had a CA segmentation performance similar to or higher than that of the original model and approximately 1.068 times better than the OS model. In contrast, the eLS model performed the worst CA segmentation of all models evaluated in our experiments, with a score of 0.693.

Fig. 5
figure 5

Results of (a) F1-score, (b) average precision (AP), and (c) intersection of union (IoU) for U-Net-based CA segmentation models with various dataset.

To measure the AP and IoU, the confidence value of each model was set from 0 to 1 in increments of 0.05. The IoU was calculated as the average value. The AP results showed the greatest improvement in the eOS model with a score of 0.955, followed by the original and OS models with scores of 0.928 and 0.917, respectively.

In addition, ANOVA analysis was performed on each evaluation factor to analyze the performance differences of U-Net-based CA segmentation models. The ANOVA analysis of AP was replaced by an analysis of recall and precision (Table 1). The analysis results of ANOVA showed that the U-Net-based CA segmentation models had significant performance differences, as very low p-values and high F-values were measured. A post-hoc HSD analysis was performed to determine whether certain pairs of U-Net-based CA segmentation models were different. Table 2 shows the results of the post-hoc HSD analysis, with most of the p-values between the models showing significant performance differences of 0.05 or less. However, the results showed that there was no significant difference between the performance of the Original – eOS and Noisy – OS models in all evaluation factors.

Table 1 ANOVA results table of U-Net-based CA segmentation models for quantitative evaluation factors.
Table 2 The results of Post-Hoc Tukey’s honestly significant difference for analyzing the performance differences of U-Net-based CA segmentation models.

Discussion

Brain stroke caused by atherosclerosis and carotid lumen stenosis accounts for a significant proportion of deaths worldwide. Because acute cerebrovascular disease caused by the stenosis of certain CAs can cause catastrophic and permanent damage to brain tissue, rapid diagnosis via medical imaging is required30,31. DSA is considered the gold standard for diagnosing cerebrovascular diseases. However, it has recently been replaced by CTA and magnetic resonance angiography (MRA) because of the limitations of the invasive technique, excessive radiation dose, and capability to obtain only the two-dimensional anatomy of vessels and lesions32,33,34.

Although MRA can image blood vessels without contrast injection, it requires a longer time than CTA to acquire cerebrovascular images, which can introduce motion artifacts. Additionally, the detectability of signals from small vessels and aneurysms is reduced due to their susceptibility to saturation35,36. CTA can also be used to analyze the potential causes of cerebrovascular diseases, including low-grade stenosis and occlusions, by identifying signal differences between blood vessels and normal tissues using contrast agents. However, the diagnosis of high-level vessel stenosis can be inaccurate because of the similarity in signals from the vertebrae, contrast media, and calcified plaques37. Anatomically, the ICA crosses the skull through the carotid canal and passes through the cavernous sinus. Additionally, the VA crosses vertically within the transverse foramina or protective bony canal from the fifth or sixth cervical vertebra to the first cervical vertebra. The anatomical features of CAs adjacent to the vertebrae and skull cause an overlap of the contrast agent signal with the bone signal, which interrupts the diagnosis of vascular malformations, stenosis, and plaques. Therefore, segmentation techniques that can remove bone signals should be applied for accurate diagnosis of CAs38,39.

For this purpose, dual-energy CT (DECT) with a material separation technique can be considered. DECT can segment the CA signals by obtaining attenuation maps for iodine (contrast agent), calcium (bone), and water (tissue). In contrast, the main factor, which reduces the material separation performance of DECT, was noise, hence various previous studies have proposed effective noise reduction techniques. The results of previous studies have contributed to the effective improvement of the material separation performance of DECT, and demonstrated that it can be applied to industrial and clinical fields40,41.

However, when applying three or more material separation techniques, including water, unnecessary signals remain and degrade the segmentation performance because the overlapping of the characteristics of the two materials by iodine and calcium have similar attenuation slopes42,43,44. Furthermore, brain CTA patients with metal implants such as stents, clips, and coils should be subjected to material separation techniques using at least four different materials. These multiple-material separation computations are highly complex and can result in inaccurate segmentation.

In addition to the techniques mentioned above, sCTA can be considered a method for CA segmentation. sCTA acquires images before and after contrast agent injection and performs subtraction of two images. The subtracted image, in which bone, tissue, and metallic substances are removed except for the contrasted blood vessels, can show improved CA segmentation performance. However, the side effects of overexposure caused by applying the 2-scan protocol need to be addressed. Although radiation dose reduction techniques such as matched mask bone elimination have been proposed, sCTA has limitations in radiation dose reduction because the 2-scan protocol still requires to be performed20,45. In particular, high-dose CT continues to attract the attention of both patients and researchers. Software and hardware research has been conducted to determine the radiation dose of CT. Therefore, despite demonstrating a high performance in CA segmentation, sCTA has not been accepted as a mainstream technology in the clinical field. However, even with a limited amount of data, DL models can be utilized to indirectly achieve the advantage of CA segmentation performance of sCTA in CT scanners under low-dose conditions. Among the DL models, the U-Net structure proved feasible for restoration, registration, and classification, including segmentation, by applying datasets based on two types of medical images with low- and high-performance features, respectively46,47,48.

Therefore, in this study, we derived a CA segmentation performance similar to that of sCTA using only single-scan and low-dose CTA images and a U-Net-based segmentation model for dose reduction. To improve the performance of the U-Net-based CA segmentation model with only CTA images, NLM algorithms with different smoothing strengths were applied for denoising, and a dataset was constructed using the RG method for CA signal emphasis49,50.

However, the performance of the U-Net-based CA segmentation model using the OS image was lower than that of the original images with a normal dose, although it was better than that of the noisy images. To achieve results comparable to those of sCTA and better performance than CTA under normal conditions using only low-dose CTA images, we constructed an enhanced dataset by acquiring CA masks based on the NLM and RG methods. The overall analysis of the quantitative evaluation results showed that the performance of the U-Net-based CA segmentation model was the best for the eOS image, followed by the original, OS, noise, HS, LS, and eLS images (Fig. 5). These results indicate that low-dose CTA images with appropriate pre-processing methods can be used to train a U-Net-based CA segmentation model with comparable or improved performance compared to normal-dose CTA images51,52. However, eLS images showed a lower performance than U-Net-based CA segmentation models using LS and noisy images. From these results, we concluded that the application of inadequate pre-processing methods can degrade the performance of U-Net-based CA segmentation models.

In addition, ANOVS statistical analysis was performed to confirm the results. The results of the ANOVA show that the differences between the models were clearly evident with very large F-statistic values (76.586–162.413) and low p-values (3.34 \(\:\times\:\) 10− 79 – 2.24 \(\:\times\:\) 10− 145) for each evaluation factors. Then, the post-hoc HSD test was performed to analyze the performance differences between the model pairs in detail. The HSD results show that most U-Net models have differences for each segmentation performance. However, the Noisy - OS model pair was confirmed as showing no significant performance difference. These results indicate that applying only noise reduction pre-processing has limitations for improving the segmentation performance of deep learning models. In addition, the p-values between the Original – eOS model pair were measured to higher than 0.05 for all evaluation factors. The high p-values indicate that the performance difference between the two models was similar. Consequently, the quantitative evaluation and statistical analysis results show the possibility of proposing a U-Net model with segmentation performance similar to sCTA when various appropriate pre-processing methods are applied using only CTA images.

The smoothing strength of the pre-processing methods for denoising is an important consideration when constructing an appropriate dataset53. The noise reduction algorithm had a significant impact on the performance of subsequent pre-processing methods, including those used independently. Figure 6 shows the segmentation results of the predicted CA obtained by changing the smoothing strength of the NLM algorithm and the threshold value of the RG technique.

Fig. 6
figure 6

The effect of NLM algorithms with various smoothing strength on the CA mask prediction performance of the RG technique: (a) Optimized, (b) low-, and (c) high-smoothing strength. The red box in (a) is the ROI for the enlarged image at the top.

The application of noise reduction algorithms with insufficient smoothing strength causes non-homogeneous lumen signals, which degrade the segmentation performance of RG. This phenomenon indicated that an excessively high threshold was required to achieve sufficient tissue segmentation when the RG method was applied. However, a high-level threshold value set to overcome fluctuating noisy signals could cause the contrasted CA to be over-segmented by connecting with normal tissue or vertebrae, which leads to various disadvantages, such as the requirement to specify a separate threshold value for each slice. In addition, the LS and eLS models resulted in significantly lower CA segmentation performance despite having noise intensities similar to those of the noisy model. Gaussian noise, which is generated in CT images, is not trained using the U-Net model because it is randomly generated (a non-common feature). Various U-Net denoising models have been proposed54,55,56. However, in the cases of LS and eLS, unnecessary common features are generated by computing the predetermined formula of the NLM algorithm. We then determined that the distortion of the image information was enhanced more than that of the noisy images by training the common features with the U-Net model, which led to a decrease in the CA segmentation performance57,58,59. In contrast, the application of a denoising algorithm with excessive smoothing intensity leads to abnormal segmentation results with the RG technique, owing to the similarity of the signals from the tissue and contrasted CA. Because the signals from two similar tissues are very sensitive to changes in the threshold value of the RG method, a very low threshold should be applied, which degrades the accuracy and performance of segmentation.

Overall, the results of our study demonstrated the possibility that, when applying various pre-processing algorithms, a U-Net-based CA segmentation model can be trained using only low-dose CTA images with a performance similar to that of sCTA. In particular, the quantitative evaluation results obtained from various datasets clearly indicated that the CT image acquisition conditions (i.e., tube current, slice thickness, and scan time), anatomical characteristics, selection of a suitable pre-processing methods, and optimized parameter settings for applying the model should be considered to achieve a model with high performance.

In addition, sCTA should consider not only the radiation dose but also the issue of decreasing the contrast agent concentration. The use of contrast agents for sCTA can have potentially fatal side effects, including acute kidney failure, hypothyroidism, and central nervous system damage60,61. Contrast agents can create artifacts owing to their high density, which can negatively affect diagnosis. However, the contrast of the blood vessels and tissues deteriorates when the injection volume of the contrast agent is reduced. Although we did not experiment with sCTA images with low-dose contrast, we indirectly confirmed the feasibility of improving the contrast of blood vessels by applying CA mask prediction based on a semi-automated thresholding method. Based on the results of this study, we are considering further studies to develop a DL model with improved CA segmentation performance using CTA images under low-dose and low-contrast-agent conditions.

On the other hand, the results of this study emphasize the importance of an appropriate pre-processing process to improve the performance of a deep learning model, however, this study lacks consideration of the variety and structure of deep learning structure. In particular, optimization of hyper-parameters should be performed to improve the performance of deep learning structure, including U-Net. Appropriate setting of hyper-parameters shows various advantages, such as increasing the accuracy of the deep learning model’s results and efficiently shortening the training time62,63. In addition, only the U-Net model was applied in this study. The U-Net model has produced high-quality results in various fields such as super-resolution, image restoration, and segmentation in the medical field. However, U-Net shows limited performance and versatility due to simple structure compared to other deep learning structures. To improve the effectiveness of the proposed pre-processing techniques, a comparative evaluation of the performance for the models using improved U-Net-based structure and/or the latest DCNN structures that can apply various segmentation techniques should be additionally performed64,65,66. As a result, to contribute the proposed method directly to the clinical system, the DCNN structure and hyperparameters, and qualitative/quantitative evaluation factors, which can derive clinical results, should be carefully considered.

Conclusion

In this study, we proposed training a U-Net-based CA segmentation model using various pre-processing methods to replace sCTA images with low-dose CTA images. In addition, the parameters of each pre-processing method were optimized to construct a high-quality dataset. In particular, we applied the NLM algorithm to solve the noise problem associated with low-dose CTA. We then employed the RG method to enhance the clarity of the boundary information for ambiguous CA, which is blurring caused by smoothing of the NLM algorithm. In conclusion, we demonstrated that low-dose CTA with training a U-net-based CA segmentation model using appropriate pre-processing methods and parameters can replace effectively sCTA.

Methods

Patients

sCTA image pairs, of images before and after contrast agent injection, were acquired from 45 patients. A 128-channel CT device (SOMATOM Definition FLASH, Siemens AG, Forchheim, Germany) was used to set the matrix size, tube voltage, tube current time, and slice thickness to 512 × 512, 100 kVp, 300 mAs, and 0.75 mm, respectively. The acquisition of images was approved by the Institutional Review Board (No. 4-2020-1364) of Yonsei University Severance Hospital. Among the acquired sCTA images, 55–72 slices describing the CA were selected for each patient. For the train, validation, and test sets, CTA images of 35 (2052 slices), 4 (248 slices), and 5 patients (594 slices) were used, respectively, in U-Net-based CA segmentation model. However, we judge that the risk of potential side effects due to excessive radiation is too high to acquire additionally low-dose conditions images in this study. Thus, CTA images selected to simulate low-dose conditions were obtained by adding Gaussian-Poisson noise with variance and random variables set to 0.167,68,69.

Pre-processing methods modelling for dataset construction

The NLM algorithm was proposed by Baudes et al. to solve the phenomenon of reduced sharpness of the image contour due to smoothing, which decreases the overall resolution of the images70. The NLM algorithm uses spatial information of the entire target image and calculates the weighted average in regions with signals similar to the local characteristics of the target region to remove noise and emphasize local features. This was achieved by calculating the difference between the center of the region of interest and the neighboring region as a weighted value based on the Euclidean distance as follows:

$$\:NLM\left[f\right]\left(m\right)=\:\sum\:_{n=1}\omega\:\left(m,n\right)f\left(n\right),$$
(1)

where, \(\:m\) and \(\:n\) are the pixel values of noise-added image, and \(\:\omega\:\left(m,n\right)\) is the distance weight. The similarity of pixels \(\:m\) and \(\:n\) that satisfy \(\:0\:\le\:\omega\:\left(m,n\right)\le\:1\). The distance weighted value was expressed as follows:

$$\:\omega\:\left(m,n\right)=\:\frac{1}{Z\left(m\right)}{e}^{-\frac{{v\left({k}_{m}\right)-v\left({k}_{n}\right)}_{2,a}^{2}}{{d}^{2}}},$$
(2)
$$\:Z\left(m\right)=\:\sum\:_{n}{e}^{-\frac{{v\left({k}_{m}\right)-v\left({k}_{n}\right)}_{2,a}^{2}}{{d}^{2}}},$$
(3)

where \(\:{k}_{i}\) denotes a square kernel of size \(\:k\), centered on pixel \(\:i\), and \(\:v\left({k}_{m}\right)\) is the vector value of the signal magnitude inside the kernel. \(\:{v\left({k}_{m}\right)-v\left({k}_{n}\right)}_{2,a}^{2}\) is the Euclidean distance with a Gaussian kernel having a standard deviation of \(\:a\). \(\:Z\left(m\right)\) is a normalization constant, and \(\:d\) is the smoothing factor that controls the smoothing strength. In the study, an interval of 0.05 was set from 0.01 to 1.00 for optimization of the smoothing factor. The kernel and search window sizes were set to 3 and 7, respectively, to enable fast processing of many images.

From the CTA images with low-intensity smoothing (LS), optimized smoothing (OS), and high-intensity smoothing (HS) based on the smoothing factor of the optimized NLM, the semiautomatic thresholding method of region growing (RG) was applied to obtain the enhanced CA signal. After setting the selected seed point, RG evaluates whether the difference between the signal of the seed neighbors and the seed point exceeds the set threshold value (\(\:T)\) or not. When the difference is less than the threshold value, the seed’s neighbors are expanded by adding to the area, and the operation is repeated successively based on the newly calculated average signal value of expended region (\(\:\mu\:)\) as follows:

$$\:{g}_{seg}=\:\left\{\begin{array}{c}Selection,\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:if\left|f\left(n\right)-\:\mu\:\right|\le\:\:T\\\:Non-selection,\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:Otherwise\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\end{array}\right.,$$
(4)

The threshold value was set to 50 by analyzing the contrast-laden CA signal of the CTA images, and 8-connectivity to minimize the effect of noise that was not perfectly resolved in the CTA images. However, the RG method has difficulty in segmenting CA signals into elliptical shapes owing to problems such as blurring of edges and rough signals by unresolving noise. To solve this problem, a morphology-closing operation was performed to clearly delineate the boundary signal. The luminal signals were complemented using an internal filling operation.

By applying the different pre-processing methods mentioned above to the CTA images, we can predict the CA mask, as shown in Fig. 5. The predicted CA masks were added to the CTA images (i.e., enhanced CTA) and applied as input data to train the U-Net-based CA segmentation model.

U-Net-based CA segmentation model

The structure of the U-Net-based CA segmentation model consists of a contraction path that extracts and compresses features essential for segmenting a target tissue and an expansion path that provides location information. For each layer of the contraction path, a 3 \(\:\times\:\) 3 convolution was performed twice, followed by a rectified linear unit (ReLU) and batch normalization (BN). Subsequently, max pooling with a stride value of two sets was performed for down-sampling. After acquiring 64 feature maps in the first layer, the number of feature maps was doubled from the previous step. For each layer in the expansion path, a 3 \(\:\times\:\) 3 convolution was performed twice, similar to the contraction path, followed by the ReLU and BN. Subsequently, up-sampling with a stride value of 2 was applied.

As the operation for each layer was performed, the number of feature maps was reduced by half to obtain 64 feature maps. Finally, a 1 \(\:\times\:\) 1 convolution layer was scaled to derive the output image. In addition, a skip connection was applied to compensate for the information lost through each layer and allow faster training. The L2-norm and Adam (adaptive moment estimation) optimizer loss functions were applied to train the U-Net-based CA segmentation model. The learning rate and epoch were set to 0.01 and 100, respectively.

In addition, we trained the U-Net-based CA segmentation model, as shown in Fig. 7, to analyze the effect of the dataset on the performance of the U-Net-based CA segmentation model with and without the pre-processing methods, and with changes in the parameters for the pre-processing methods.

Fig. 7
figure 7

Illustration of the process of building a dataset with pre-processing methods for training a U-Net based CA segmentation model: Input data was constructed using the NLM algorithm and RG technique, and label data was constructed based on sCTA.

Quantitative evaluation and statistical analysis

The noise level factors including the coefficient of variation (COV) and contrast-to-noise ratio (CNR) were calculated to optimize the smoothing factor of the NLM algorithm as follows:

$$\:COV=\:\frac{{\sigma\:}_{A}}{{S}_{A}},$$
(5)
$$\:SNR=\:\frac{\left|{S}_{A}+{S}_{B}\right|}{\sqrt{{{\sigma\:}^{2}}_{A}+{{\sigma\:}^{2}}_{B}}},$$
(6)

where \(\:{S}_{A}\) and \(\:{\sigma\:}_{A}\) were the mean signal and standard deviation for the region of interest (ROI) A. \(\:{S}_{B}\) and \(\:{\sigma\:}_{B}\) were the mean signal and standard deviation for ROI B in Fig. 2.

To quantitatively evaluate the performance of the U-Net-based CA segmentation model, the F1-score, which enables a complex evaluation of recall and precision, was measured, as shown in Fig. 8. To measure the quantitative evaluation factors, the confidence level for each model was set at 0.5. A value close to 1 for the quantitative factors indicated that the output data were similar to the label data. To evaluate the performance considering both specificity and sensitivity according to the change in confidence, average precision (AP) which represents the precision-recall (PR) curve graph area, was measured. In addition, intersection over union (IoU) was measured to evaluate the accuracy of the area of the detected CA signal (Fig. 9). A value close to 1 for the quantitative factors indicated that the output data were similar to the label data.

Fig. 8
figure 8

Illustration of recall, precision, and F1-score measurements to quantitatively evaluate the performance of a U-Net based CA segmentation model.

Fig. 9
figure 9

Illustration of (a) AP and (b) IoU measurements to quantitatively evaluate the performance of a U-Net-based CA segmentation model.

Statistical analysis of the results was performed to evaluate the performance differences of the U-Net-based CA segmentation models trained with each dataset. Analysis of variance (ANOVA) is a statistical test that can be applied to three or more groups to determine whether each U-Net-based CA segmentation model shows a significant difference in performance. A post-hoc Tukey’s Honestly Significant Difference (HSD) analysis was performed to clearly identify differences between models with statistical significance after significant differences were found using the ANOVA.