Fig. 2

(a) The illustration of our SCN architecture, which consists of dual-branch main network (self-predictor and OAR feature module) and feature correlation module. CT volumes are fed into the dual-branch main network, which extracts features from two different branches respectively. Further, feature correlation module fuses two features \(X_C\) and \(X_O\) from dual-branch main network. Finally, the output feature of the feature correlation module is concatenated with the features came from the self-predictor decoder to obtain the final segmentation result. (b) The illustration of feature correlation module. \(X_C\) and \(X_O\) represent the features from the second layers of the decoder of the self-predictor and OAR feature model, respectively. Q, K, and V represent query, key, and value. \(\bigoplus\) denote the element-wise addition. The ellipsis indicates that different groups of parallel cross-attention heads are concatenated together.