Fig. 1

Model architecture. (A) Overall model architecture featuring a dual-stream design: The spatial stream uses EfficientNet-B2 with a multi-head self-attention layer and the scoring stream integrates AI-generated RCFT scores (from a previously developed scoring model) together with demographic features (sex, age and years of education). Outputs from both streams are fused via average fusion to yield the final CN/MCI classification. (B) Detailed architecture of the EfficientNet-B2 model used in the spatial stream, including convolutional layers and Mobile Inverted Bottleneck Convolution (MBConv) layers.