Fig. 8: Illustration of the architecture of M3-VF module.

The M3-VF module integrates complete multimodal data to achieve fine-grained, four-class classification of visual field defects, supporting precise diagnostic processes. The numerical flow extracts features using fully connected layers. RNFL flow, focused flow, and global flow share the same structure, composed of convolution, batch normalization, ReLU, max pooling, and CBAM-residual blocks. The CBAM-residual blocks capture important spatial and channel-wise features. Feature fusion is achieved using a transformer encoder, which captures both local and global dependencies between feature vectors. Class mapping is performed through a multilayer perceptron.