Fig. 1: Overview of data preprocessing and analysis.

a Lesion masks (opaque pink; left image) were manually drawn on 231 native T2 scans, resampled to native T1 scans (pink outline right image), refined, and healed by filling them with intact tissue around the homologues (c.f., opaque and transparent pink in image on left; result on right). Middle boxes: Cerebrospinal fluid, white, and gray matter tissues were segmented from healed T1s using FAST (left image). Healed T1s were registered to 2 mm MNI template with FNIRT (right image). Bottom boxes: Tissue and lesion maps were normalized and combined, with lesions superseding other tissue (right image). Volumes were downsampled to 8 mm and cropped (left image). b Volumes were concatenated across participants and linked to WAB-AQ, which was used to form severe (35%) and nonsevere (65%) aphasia categories by abridging very severe/severe and moderate/mild categories (denoted by vertical lines on histogram). Data was partitioned for predicting aphasia severity, with model performance evaluated over 20 repeats of a nested cross-validation scheme with stratification (middle box). In each repeat, models were tuned over 8 inner folds, exposing them to approximately 169 samples during training and 24 during testing. Once hyperparameters were selected, the models were fitted to the training data in the outer folds, which consisted of approximately 193 samples. For some models (i.e., CNN), the outer training dataset was repartitioned to leave data for training evaluation. Models were then tested on the approximately 38 samples they had not seen during training or tuning, and the process was repeated for the other outer folds to generate a prediction for each sample in the data. The same partitions were used to train a CNN, SVM, and to implement model fusion strategies (bottom boxes). CNN tuning involved selecting network complexity (see deep learning section in methods for more details), dropout frequency, learning rate and L2-norm (right bottom box; network complexity increases left to right with changes to block composition and/or layer properties). SVM tuning involved selection of kernel, gamma, cost, dimensionality reduction technique to implement prior to training, as well as the number of dimensions to retain. Model fusion entailed averaging predictions made by the two models, stacking the predictions using another model, and chaining CNN-based feature extraction with SVM-based prediction, either by using the learned lower-dimensional features or higher-dimensional saliency maps (SHAP or Grad-CAM++).