Introduction

Pelvic fractures are serious traumas that involve complex orthopedic surgeries, which have always posed a huge challenge for orthopedic surgeons. Traditional open surgery often results in significant intra-operative trauma and slow recovery. With the development of medical technology, minimally invasive pelvic surgery has gradually emerged as a promising field for the treatment of pelvic fractures due to its low invasiveness and fast recovery1. 2D/3D registration is a crucial technology that provides real-time and accurate 3D positioning for minimally invasive pelvic surgery, aiming to find an optimal 3D transformation that minimizes misalignment between 2D images and 3D volumes. In pelvic surgery, CT is usually the preferred 3D pre-operative data, and DRRs can be generated from CT using ray casting. Therefore, 2D/3D registration can be achieved by directly comparing the X-ray image and the generated DRR.

Traditional 2D/3D registrations are usually solved by optimization-based approaches2. These approaches3,4 have iteratively searched for the 3D pose that best matches the similarity score between DRR and X-ray image. Thus a large number of paired DRR and X-ray images are required during the iterative optimization process, which leads to the approach slow in terms of computation. In addition, the optimization-based methods rely on a good initial posture. Otherwise, the method may be have difficulty converging to the correct position, and resulting in registration failure. With the rapid development of deep learning technology, several learning-based strategies have been developed to 2D/3D registration5,6,7,8,9,10. Learning-based methods utilize the learning ability of convolutional neural networks (CNNs) to directly estimate the pose of 3D data. These methods have the advantages of computational efficiency, large capture range, and good registration performance. However, they still have two limitations. First, a large amount of data is required for training. When the labeled X-ray images used for training are limited, the he model’s generalization ability is significantly reduced. Second, most methods commonly utilize single view images for registration. Due to the inherent limitations of single-view registration, there may be some ambiguities in the registration process.

To improve navigation accuracy, the multiview 2D/3D medical image registration technology has recently emerged. This technology provides more comprehensive and accurate 3D anatomical structure information by aligning medical image data from different views and modalities. Unlike the inherent challenges in single-view registration, this method considers image depth information and integrates image information from different view, providing a potential solution for improving navigation accuracy. However, recent multiview 2D/3D registration methods are limited when the paired DRRs and X-ray images are insufficient. Therefore, the main challenges currently faced are:

  • Instability of single-view registration Single-view registration lacks depth and geometric constraints, leading to unstable results and lower accuracy.

  • Data requirement for multiview registration Multiview registration requires large amounts of labeled data, which is difficult to obtain in clinical settings.

  • Balancing accuracy and computational time There is a trade-off between accuracy and computation time, making it hard to achieve the right balance for real-time clinical applications.

To address the above drawbacks, we propose an innovative multiview 2D/3D image registration method based on style transfer, which includes POITT, triangulation layer, and SA-BnB. POITT establishes point-to-point consistency matching relationships in DRRs and X-ray images by automatically detecting and tracking key interest points. The triangulation layer projects the tracked POIs in X-ray images from multiviews back into the patient’s 3D space. SA-BnB directly aligns 3D CT with patient by searching for the optimal point pair matching. The contributions of this paper are as follows:

  • A novel lightweight POIs tracking network is proposed Combining U-Net and convolutional block attention module (CBAM) for improved accuracy and robustness of POIs tracking. The simple and clear architecture provide an efficient and lightweight solution for POIs tracking for practical application scenarios.

  • An effective solution for addressing insufficient data is proposed based on style transfer Unifying the style between DRRs and X-ray images for effective POIs tracking. This improves model performance on real X-ray images and offers a new method for handling limited data in 2D/3D registration.

  • A more precise BnB shape alignment method is introduced By leveraging a global optimization strategy, an optimal rigid transformation between point sets is efficiently searched, enabling accurate alignment of key structures and improving overall registration accuracy.

Related work

2D/3D registration can be categorized into optimization-based methods and learning-based methods. Optimization-based methods usually use iterative matching strategies to repeatedly sample and generate DRRs, which result in high computational costs and low accuracy. To avoid these problems, several efficient works have been proposed in hardware-level or software-level. Chang et al.3 focused on 2D/3D registration of spinal images and compared the effects of five similarity measures in three optimization algorithms with any combination. Ghafurian et al.4 innovatively used weighted histograms of image gradient directions as image features to calculate image similarity, significantly reducing the number of DRR generation and effectively reducing computational costs. Mathias et al.11 constructed the DeepDRR framework, which can quickly and automatically simulate X-ray images from CT volume data. Meanwhile, DeepDRR as a universal model can be applied to various clinical data. Although these methods have minimized the generation time of DRR as much as possible, the overall registration time was still relatively long, reaching up to 16 seconds in some cases3.

The learning-based methods involve CNNs and deep learning techniques, providing new ideas for image registration. Unberath et al.12 provided a comprehensive review of machine learning in 2D/3D registration for image-guided interventions, highlighting the potential of machine learning-based methods to address some of the notorious challenges in 2D/3D registration. The ReMIND2Reg challenge13 at the MICCAI conference involves 2D-to-3D registration tasks and provides various registration methods. Wu et al.5 used CNN to automatically learn image features and evaluated image similarity by combining subspace parameters to obtain registration parameters. Miao et al.6,7 regarded the registration problem as a regression problem and proposed a real-time 2D/3D medical image registration method based on CNN regression for the first time. They proposed three algorithmic strategies to build regressors and predicted six transformation parameters. However, the model adopted a simple and shallow network structure; therefore, its fitting and generalization abilities can still be improved in the future. Wang et al.8 proposed a self-supervised learning network based on Densnet, which achieved direct regression for 2D/3D registration, significantly improving accuracy and speed. When processing large amounts of data, the network fitting was insufficient. Chen9 regarded the registration problem as a classification problem and constructed a registration model using Google-Net-Inception V3, which can quickly complete registration. While its output transformation parameters were limited with integers, making it more suitable for coarse registration tasks. Gao et al.10 proposed the ProST module for generating DRRs and applied it to 2D/3D medical image registration to achieve end-to-end registration models. However, the translation along the depth direction was less accurate than along other directions. In addition, considering the potential issues associated with single-view registration, Liao et al.14 proposed a multi-view 2D/3D rigid registration model based on interest point tracking and triangulation (Point\(^2\)). This model directly aligned 3D volume by tracking the interest points from DRRs to X-ray images, avoiding out of plane errors. However, the model required numerous data to train, and the insufficient paired DRR and X-ray images would lead to registration failure. Wang et al.15 improved the point\(^2\) by introducing domain adaptation, resulting in slightly better performance on limited data. However, the feature transfer method remained simple, and its generalization ability still needed improvement. At the same time, the style transfer method have handled only for paired DRRs and X-ray images, which limited its application.

Materials and methods

Figure 1 illustrates the overall architecture of the proposed method, which consists of three key modules: POITT, triangulation layer, and SA-BnB. First, the POITT constructs a POIs tracking network on DRRs and leverages style transfer techniques to unify the appearance of X-ray images and DRRs. This enables precise feature point matching in X-ray images. Next, the triangulation layer is introduced after the POITT to establish a high-precision spatial mapping between X-ray images and the patient’s anatomical structures. This mapping allows for the accurate identification of POIs on the patient’s anatomy. Finally, the SA-BnB employs an optimization algorithm to compute the optimal transformation matrix between the CT volume and the patient’s anatomy, ensuring precise image alignment.

The study was a medical imaging investigation approved by the Ethics Committee of the Chinese PLA General Hospital (No.S2022-781-01). Due to its retrospective nature and the fact that the patients’ data were anonymous, a waiver of patients’ informed consent was granted by the Ethics Committee of the Chinese PLA General Hospital. We confirm that all experiments were performed in accordance with relevant guidelines and regulations.

Fig. 1
Fig. 1
Full size image

Overview of the proposed method.

The POIs tracking network based on style transfer (POITT)

POIs tracking The primary objective of the POITT is to achieve precise POIs mapping and tracking from DRR to X-ray images. However, acquiring multi-view X-ray data is challenging, and manually annotating POIs is both time-consuming and labor-intensive, resulting in a limited number of X-ray datasets with precise POIs annotations. To address this issue, we utilize DRR as training data and construct the POIs tracking network using synthetic images from different viewpoints. This approach enables the model to learn the feature mapping from DRR to X-ray images, thereby improving the accuracy and robustness of POIs tracking.

Specifically, selecting a set of 3D POIs \(\{X^{ct}_1,X^{ct}_2...X^{ct}_N\}\) on CT volume, where the corresponding DRR(\(I^D\)) and 2D POIs can be obtained through 3D-to-2D projection and we denote the set as \(\varvec{x}^D=\{x^{D}_1,x^{D}_2...x^{D}_N\}\), which can be converted into a sets of heatmaps \(\{M^D_1,M^D_2...M^D_N\}\) using a Gaussian function. The network takes DRRs as input and heatmaps of 2D POIs as label. And the outputs are the tracked POIs on test DRR in the form of heatmaps \(\{ \hat{M}^D_1,\hat{M}^D_2...\hat{M}^D_N\}\).

The structure of the POI tracking network is shown in Fig. 2. We use U-Net16 as the basic architecture and integrate CBAM17 to the network. Selecting a view, we take DRRs in this view as input and perform fine-grained feature extraction at the pixel level. To guide the network to focus on key areas of the image, we construct a multi-scale mixed attention mechanisms to reduce interference from irrelevant information. The heatmaps of 2D POIs on DRRs are labels during the training process. The output of the network are estimated heatmaps with the same resolution as the input DRRs. For an image with size \(M \times N\), the size of heatmap is \(M \times N \times C\), where C is the number of channels, equating to the amount of POIs.

Specifically, assuming given transformation matrix R and vector t from m views, we can capture m-view POIs sets in DRR and train m POIs tracking networks from different views. Each of the networks will track POIs for the designed view. Given the sets of X-ray images \(\{I^X_1,I^X_2...I^X_m\}\) from the m views, these networks outputs the tracked X-ray POIs of each view in the form of heatmaps. In general, we can locate the points in three-dimensional space by images from two views. Thus m can be set to 2 in the experiments. As shown in the Fig. 1, two views of anterior-posterior and inlet are selected to construct POI tracking networks. However, the X-ray images obtained in the pelvics surgery usually are significantly different from simulated DRRs, thus we introduce style transfer using CycleGAN18 to reduce the style differences between these two images.

BCELoss has be used as the loss function that regulates the training of POIs tracking network:

$$\begin{aligned} \mathcal {L} = \frac{1}{n} \sum _{i}BCE(\sigma (\hat{M}^D_i),\sigma (M^D_i)) \end{aligned}$$

where \(M^D_i\) is the ground truth heatmap, BCE is the pixel-wise binary cross entropy function and \(\sigma\) is the sigmoid function.

Fig. 2
Fig. 2
Full size image

The architecture of the POI tracking network.

Style transfer Due to significant differences in the generation methods and imaging characteristics of DRR and X-ray images, there is a style discrepancy between them, which often leads to lower accuracy when applying a network trained on DRR to X-ray images. Therefore, CycleGAN is introduced for style transfer, aligning the styles of DRR and X-ray images, thereby improving the accuracy of interest point tracking on X-ray images. Then the POI tracking network takes the two-view X-ray images being consistent with the DRR’s style by style transfer as the input and outputs the heatmaps \(\{\hat{M}^X_1,\hat{M}^X_2...\hat{M}^X_N\}\) in each view.

Triangulation layer

After inputting the two views of X-ray images into the POITT, we can obtain the corresponding heatmaps of the POIs for each image. Then, based on 1, we calculate the POIs coordinates corresponding to the X-ray image.

$$\begin{aligned} \hat{x}_{mn}^{X} = \frac{1}{ {\textstyle \sum _{x}\hat{M}_{mn}^{X}(x) } } \sum _{x}\hat{M}_{mn}^{X}(x) \end{aligned}$$
(1)

Where \(\{\hat{M}_{1n}^{X},\hat{M}_{2n}^{X}...\hat{M}_{mn}^{X}\}\) being a set of heatmaps from different views, \(\hat{M}_{mn}^{X}\)is the heatmap of the n-th interest point from the m-th view.

After obtaining the POIs from X-ray images from two views, we use the triangulation layer to locate the corresponding 3D points on CT. Specifically, the triangulation layer leverages the POIs’s coordinates from multiple views and performs 3D reconstruction based on geometric relations. This approach enables accurate conversion from 2D POIs to 3D space points.

Detailed triangulation processes and associated mathematical derivations can be found in the supplementary materials.

Shape alignment based on branch and bound (SA-BnB)

We design the BnB mechanism19 to search for shape alignment. Specifically, assuming \(X^{ct} = \{X_1^{ct},X_2^{ct}...X_N^{ct}\}\) being the initial POIs selected from CT and \(\hat{X}^p = {(\hat{X}_1^p,\hat{X}_2^p...\hat{X}_N^p)}\) being the estimated 3D POIs on the patient, we can calculate the optimal transformation matrix \(T^*\) by aligning these points. Thus, the optimization objective is,

$$\begin{aligned} T^* = arg\min _{T}\left\| T\cdot X^{ct} - \hat{X}^{p} \right\| _F \end{aligned}$$
(2)

As the transformation matrix \(T^*\) for rigid motion can be parameterized by \(R\in SO(3)\) and \(t \in R^3\), thus we can further derive (2) to the following objective function to ensure that the transformed \(X^{ct}\) is closely aligned with \(\hat{X}^p\). Then, the lower bound can be calculated by the following function,

$$\begin{aligned} \underline{E}_T=\min _{R,t}\sum _{i=1}^{N} \left\| RX_i^{ct}+t-\hat{X}_i^p \right\| _2^2 \end{aligned}$$
(3)

To speed up the calculation, the upper bound is set to the average value between the centroids distance of the points set and the bounding box distance,

$$\begin{aligned} \overline{E}_T = \frac{\left\| C_{X^{ct}} - C_{\hat{X}^p}\right\| _2 + \left\| B_{X^{ct}} - B_{\hat{X}^p}\right\| _2}{2} \end{aligned}$$
(4)

where \(C_{X^{ct}}\) and \(C_{\hat{X}^p}\) are the centers of point sets \(X^{ct}\) and \(\hat{X}^p\). \(B_{X^{ct}}\) and \(B_{\hat{X}^p}\) are the bounding box of point sets \(X^{ct}\) and \(\hat{X}^p\), which are defined as the maximum and minimum coordinates of the points set. The target of search process is to find the optimal transformation matrix of \(X^{ct}\) transfer to \(\hat{X}^p\), where \(X^{ct}\) is the source points set and \(\hat{X}^p\) is the target points set. The specific implementation process of SA-BnB is list in Algorithm 1:

Algorithm 1
Algorithm 1
Full size image

Shape alignment based on branch and bound (SA-BnB)

Experiments and results

Dataset and implementation details

The exprimental dataset is collected from the fourth medical center of Chinese PLA General Hospital, including CT scan data and X-ray images from 8 patients. The two views are acquired from patient’s pelvic anterior-posterior view and inlet view. To verify the performance of the proposed model, we conduct POIs tracking and registration experiments on synthetic DRRs and real X-ray images, respectively. For the experiments on DRRs, we randomly generate 3000 paired DRRs from two views per patient using the DeepDRR. The specific parameter settings can be found in the supplementary materials. For the training of POITT, 2700 DRRs are randomly selected for training, and the remaining 300 images are used for testing. For the experiments on X-ray images, we utilize 8 real intraoperative X-ray images to verify the POIs tracking and registration results.

The detailed software and hardware specifications used for running the experiments are provided in the supplementary materials.

Evaluation metrics

To quantitatively analyze the results of POIs tracking, we use Mean Point Drift (mPD) to measure the tracking error. mPD typically calculates the average Euclidean distance between the tracked POIs and the true points.

To quantitatively analyze the registration performance, we use four evaluation metrics: average registration error of fiducial points (mFRE), gross failure rate (GFR), average transformation error (mTME) and average registration time (mReg.time). mFRE is the average distance between patient 3D POIs of patient and aligned CT. GFR is the percentage of FRE greater than 10mm (Registration fails when FRE is greater than 10 mm14,20). mTME is the errors between predicted transformation and actual transformation parameters. Moreover, in order to present the registration results more intuitively, we use normalized cross correlation (NCC) and structure similarity index measure (SSIM) to evaluate the registration results between aligned DRR and the original X-ray iamge. For registration on real X-ray images, only SSIM and NCC are used as metrics due to the lack of real labels. Additionally, we calculated the 95\(\%\) confidence intervals of the results using a two-sample t-distribution method for the difference of means, and employed the Wilcoxon rank-sum test to compute p-values for further evaluation of the model performance differences.

The results of POIs tracking

Fig. 3
Fig. 3
Full size image

Comparison results of 2D POIs tracking. (a) DRRs with ground truth POIs. (b) predicted POIs of POITT. (c) predicted POIs of Landmarks Detect.

Table 1 Quantitative comparison results of 2D POIs tracking.

To demonstrate the advantages of our POITT, we compared it with a classical X-ray POIs detection method, referred to as “Landmarks Detect”. Fig. 3 and Table 1 present the visual and quantitative results, respectively. (a) shows the DRRs with ground truth POIs, (b) shows the predicted POIs of POITT, and (c) shows the predicted POIs of the Landmarks Detect method. The numbers below (b) and (c) show the mPD between the predicted points and the ground truth points. From the figure, it is clear that both methods demonstrate high accuracy in POIs tracking. Table 1 shows the mPD and tracking time for the results. As seen in the table, the accuracy difference between the two methods is negligible, with our method achieving slightly lower mPD than the comparison method. However, the tracking time for our method is significantly shorter, with an average of only 0.14s, compared to 0.71s for the comparison method, which is nearly 5 times slower. Additionally, POITT achieved a smaller standard deviation and 95% confidence interval compared to Landmarks Detect, indicating that our model is more stable in performance. Therefore, our method shows a distinct advantage in POIs tracking, especially in terms of high tracking efficiency, which is crucial for fast registration.

The results of 2D/3D registration on synthetic DRRs

To validate the effectiveness of our model, we compare it with several representative methods: optimization-based methods21, learning-based methods6, and other multiview registration methods14,15,22. Specfic results are shown in Tables 2 and 4.

Fig. 4
Fig. 4
Full size image

Visualization of synthetic DRRs registration comparison.

Table 2 2D/3D registration performance comparing with different methods.

The registration results are presented in Table 2, which provides a quantitative comparison of different registration methods on synthesized DRRs. The experimental results indicate that optimization-based methods yield the poorest registration accuracy while also incurring the longest computation time. In contrast, learning-based methods achieve improved registration performance with an average processing time of only 1.74s per image. However, these methods suffer from a high registration failure rate of 16.4%, making them unsuitable for clinical applications requiring high precision. As a multi-view registration approach, Point\(^2\) demonstrates notable improvements in registration accuracy. However, this model heavily relies on large-scale training data, and due to the limited dataset used in this study, it fails to reach its optimal performance. The transfer learning-based method incorporates a transfer learning module to mitigate the issue of data inefficiency, leading to better registration performance on the small-scale dataset used in our experiment. Nevertheless, it still falls short when compared to the proposed method, such as on severely misaligned pelvic structures (Case 4 in Table 4). Our method achieves the best registration results across all key metrics, reducing mTRE, GFR, and mTME to 6.19mm, 3.33%, and 0.0655mm, respectively. However, our method only achieved a registration time of 3.12s, which is significantly faster than Opt-based and the BnB methods, but still slower than learning-based methods such as Point\(^2\) (0.78s) and Transfer-learning (0.52s). This indicates that our method achieves a balance between accuracy and efficiency, but there is still room for improvement in runtime performance.

Table 3 P-value and confidence intervals of 2D/3D registration performance comparing with different methods.

Table 3 presents a comparative analysis of different methods in 2D/3D registration, including the 95% confidence intervals (CIs) for mFRE, mTME, and mReg.time, along with the P-values from Wilcoxon rank-sum tests conducted on the mTME to assess statistical significance between our proposed method and competing approaches. In conjunction with the results from Table 2, it is evident that our method achieves the smallest standard deviation and the narrowest confidence interval for both mFRE and mTME, indicating its superior stability in the registration task. Notably, our method achieves an exceptionally low mTME CI of (0.05, 0.08), significantly outperforming the learning-based method (0.61, 1.13) and the multi-view registration method Point\(^2\) (0.13, 0.27), reflecting a substantial reduction in registration errors and an overall improvement in accuracy. While some overlap in CIs is observed in the mFRE, this does not compromise the statistical significance of our method’s superiority in mTME. Since the Wilcoxon rank-sum test is specifically performed on mTME and all P-values are below 0.05, the performance difference in this key metric is statistically significant, providing strong evidence of the accuracy advantage of our method. In terms of computational efficiency, our method achieves a registration time CI of (2.77, 3.47) seconds, which is slightly higher than that of Point\(^2\) (0.68, 0.88) seconds and the transfer learning-based method (0.45, 0.59) seconds. In future work, we will further explore model lightweighting and efficient inference strategies to improve registration speed while maintaining accuracy.

Figure 4 presents the visualization results of different methods on synthetic DRRs, where each row represents a test instance. The comparison clearly demonstrates that our proposed method achieves the closest alignment with the ground truth, exhibiting the highest accuracy. This strongly validates the effectiveness and superiority of our method in the registration task. In contrast, other methods exhibit varying degrees of misalignment, failing to achieve the same level of precision as our approach, further highlighting its advantages. To more clearly demonstrate the advantages of our method, Supplementary Figure 3 presents the real images and corresponding registration error heatmaps for different approaches. As shown in the figure, our method significantly outperforms the other compared methods in terms of registration accuracy and error heatmap quality. It effectively aligns key regions of the pelvis with a uniform distribution of registration errors, further validating the superiority and robustness of our approach.

In summary, our proposed method significantly outperforms other competing methods in both registration accuracy and error heatmap performance, achieving precise alignment of key pelvic regions with a uniform distribution of registration errors. This further validates the superiority and robustness of our method, demonstrating its high reliability and stability in practical applications.

The results of 2D/3D registration on real X-ray images

Fig. 5
Fig. 5
Full size image

POI tracking and 2D/3D registration results. (a) DRR with POI. (b) X-ray image with tracked POI. (c) Aligned DRR.

This section further validates the registration performance of the proposed method on clinical real X-ray images. Due to the significant style differences between DRR and real X-ray images, directly applying a model trained on DRR to X-ray images may lead to a loss in registration accuracy. To address this issue, we introduce CycleGAN for style transfer, aligning the appearance of X-ray images with that of DRRs, thereby improving the model’s adaptability and enhancing registration accuracy.

Table 4 lists the quantitative registration results for five test cases. Since real X-ray data lack ground truth labels, we evaluate registration accuracy using structural similarity index (SSIM) and normalized cross-correlation (NCC) between the registered CT projection and the X-ray image.

Table 4 2D/3D registration results of real X-ray images comparing with other methods.

The results indicate that both optimization-based and learning-based methods fail to achieve the level of accuracy required for surgical navigation. The Point\(^2\) shows improved registration performance, achieving relatively higher values in SSIM and NCC. However, it heavily relies on large-scale multi-view paired data for training, and due to the limited size of our dataset, its performance does not reach its full potential. Furthermore, while the BnB method supports multi-view registration, it requires 22 sec per case, and its accuracy does not provide a significant advantage. The transfer learning-based approach, which incorporates a VGG-based transfer learning network, successfully mitigates the issue of limited training data and achieves better registration results than our method in Case 3 and Case 5. However, due to the simplicity of the VGG architecture, its performance deteriorates when handling complex misalignment cases (e.g., Case 4), leading to a reduction in registration accuracy. In contrast, our method outperforms the transfer learning-based approach in most test cases. In the Anterior-posterior view, our approach achieves an average NCC of 0.6535 and SSIM of 0.7823, while in the Inlet view, NCC and SSIM reach 0.5841 and 0.7591, respectively, demonstrating a significant improvement in registration accuracy. It is worth noting that the CIs of our method partially overlap with those of the comparison methods. For example, when compared with the transfer-learning method, the NCC intervals are (0.5801,0.7269) vs. (0.5592,0.6738), indicating that in some cases, the performance differences may not be statistically significant. However, based on the average performance and consistency, our method shows a clear advantage in most cases, suggesting stronger robustness and generalization capability.

To provide a more intuitive visualization of the registration performance, Fig. 5 shows the POIs tracking and registration results for five test cases in both anterior-posterior and inlet views. (a) represents the DRRs with annotated points, (b) represents the real X-ray images with the tracked points, and (c) represents the DRRs generated from the registered CT projection, aligned with the X-ray images. The results demonstrate that our method accurately completes the registration task, maintaining high robustness and precision even in cases of severe misalignment (Case4), further validating the accuracy and stability of our approach.

Ablation study

This section discusses an ablation study of the proposed method. We still use mFRE, GFR, mTME, mReg.time, NCC and SSIM to evaluate the performance. The evaluation results are given in Tables 5 and 7.

Table 5 Ablation study of the POI tracking network and shape alignment.
Table 6 Ablation study of CI 95 on the POI tracking network and shape alignment.

In this experiment, we explored two POI tracking networks (U-NET and U-NET+CBAM) and two shape alignmets (Procrustes analysis and BnB). Table 5 shows the performance of different cases. We found that the coarse registration can be realized when directly using U-NET for POI tracking, with poor results of 14.4759mm mFRE, 15.8\(\%\) GFR and 0.4375mm mTME. Adding CBAM, the indicators mFRE, GFR and mTME have obviously reduced. As shown in the second and third rows of the Table 5, BnB achieved the best results with 6,1941mm mFRE, 3.33% GFR and 0.0655mm mTME, which have significantly improved compared with other cases. This is because BnB performs better in avoiding local optima and handling larger initial errors. At the same time, from the confidence intervals in Table 6, it can be seen that our model is more stable.

To evaluate the performance of style transfer, we compared the registration results with and without style transfer during the testing stage. Table 7 shows the results of two views. We found that without adding style transfer, the network trained on DRR is difficult to transfer to X-ray image due to the significant discrepancy between DRR and X-ray image, which results in poor registration results. After adding style transfer, the style between DRR and X-ray image is unified, and the registration accuracy is greatly improved. Although the registration time may slow down due to the increased complexity of the network, considering the crucial role of accuracy in surgical navigation, we can choose to lose some time to improve accuracy.

Table 7 Ablation study of the style transfer.

Limitations

To provide a more balanced perspective, we acknowledge the following limitations in our current method:

C-arm relative pose geometry Our method assumes that the relative C-arm pose geometries is known. However, most of the existing C-arm machines in the operating room do not provide such geometry information, which limits the applicability of our method.

Limited multi-view data The limited multi-view data affects the registration accuracy. Although we simulate X-ray images using DRR and unify the style through CycleGAN to reduce accuracy loss, errors in style transfer and fiducial point tracking may further increase the registration error.

Registration timeThe current training strategy trains one patient at a time, which results in longer training times, especially for critically ill patients who cannot afford to wait for the training process. And Our registration time has not reached optimal performance and requires further investigation.

Obstructions in clinical practice This work is based on an ideal dataset, whereas in clinical practice, obstructions such as surgical instruments or implanted screws may affect the model’s accuracy. We are currently investigating how to handle these obstructions and have achieved preliminary results.

Conclusion

This study explores the multi-view 2D/3D image registration problem in minimally invasive pelvic surgery navigation and has achieved the beneficial research results. The proposed model establishes point-to-point correspondence between 2D X-ray images and 3D CT data, and then aligns the shape between matching points to estimate the CT pose. The POI tracking network based on style transfer can accurately track POIs from DRR to X-ray image and establish point-to-point correspondence. The triangulation layers can project the POIs on the X-ray image to the patient. The BnB is employed to shape alignment as it can handle various types of data and perform registration more accurately. The experiment proved feasibility and superiority of the model, and provided a reliable solution for multiview image registration in minimally invasive pelvic surgery navigation.