Introduction

Attention Deficit/Hyperactivity Disorder (ADHD) is a prevalent mental disorder which affects children and primarily caused by genetics and environmental factors1. It is estimated that 8–11% of school-age children are affected by ADHD and 30–50% of them keep ADHD symptoms in their adulthood2,3,4,5,6. Children with ADHD struggle to maintain focus on tasks, which can lead to challenges in academic success, social interactions, and professional achievement4,5,6. This multifaceted disorder is defined by enduring deficits in attention, hyperactivity, and/or impulsivity. ADHD is categorized into three subtypes: a rarely identified hyperactive-impulsive subtype (ADHDH), an inattentive subtype (ADHDI), and a predominantly combined subtype (ADHDC) characterized by both inattention and hyperactivity, as delineated in the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV)5,7.

The diagnosis process based on symptoms relies on the experience of clinicians, could cause biases and may result in delays in treatment initiation8,9,10,11,12,13. Thus, integration of machine learning (ML) methods can significantly reduce the duration of the diagnostic process, thereby enhancing efficiency and enabling timely medical interventions.

Advanced neuroimaging modalities, such as structural and functional magnetic resonance imaging (sMRI/fMRI)14 serve as valuable sources of biomarkers for diagnosing ADHD. Functional connectivity networks (FCNs) derived from resting-state functional magnetic resonance imaging (rs-fMRI) data can be used to investigate alterations in interconnection between various brain regions. Recent investigations have underscored the efficacy of FC in elucidating disparities between individuals with ADHD and typically developing counterparts15,16.

Numerous ML methodologies have been proposed for ADHD classification based on brain functional connectivity (FC), broadly categorized into classical and deep learning (DL) approaches. Classical methods typically follow a sequential procedure comprising feature extraction, feature selection, and decision via classifiers. Notable among these are methodologies such as t-test analysis17, support vector machine with recursive feature elimination (SVM–RFE)18,19, and least absolute shrinkage and selection operator (LASSO)20,21. However, classical classification methods are burdened by inherent complexities, including the need for extensive knowledge about characteristic features of different ADHD subtypes, which significantly affects the accuracies of these methods, ranging between 65 and 85%22,23,24.

In recent years, representation learning has become a prominent area of focus in classification, leading to advancements in DL, especially convolutional neural networks (CNNs). This obviates the necessity for explicit data transformation, instead enabling the implicit representation of subtype-specific characteristics. Numerous studies have explored the efficacy of three- and four-dimensional CNN models based on the spatial and spatio-temporal characteristics of ADHD traits using sMRI and fMRI data10,25 for automatic ADHD classification9,26,27. Among the DL tools, auto-encoding and attention networks have been shown to better capture discriminative high-level features from rs-fMRI data, resulting in higher classification accuracy28,29.

Thus far, the DL methodologies utilized for ADHD classification have primarily relied on static functional connectivity (sFC), involving the derivation of FCNs spanning the entire scan durations. However, emerging evidence suggests that dynamic changes in brain FCNs offer a more comprehensive understanding of the underlying pathophysiology of ADHD4. Dynamic functional connectivity (dFC) allows us to capture the dynamic changes in FC over time, providing valuable insights into the complex interactions between brain regions. In essence, dFC focuses on capturing functional alterations within brain networks over brief intervals, thereby potentially enhancing the accuracy of classification algorithms. Additionally, while the majority of existing methodologies classify subjects into broad categories of normal or ADHD, discerning between different subtypes of ADHD, such as ADHDI and ADHDC, is pertinent.

In this investigation, we present a novel DL approach called Skip-Vote-Net for the automatic classification of ADHD subtypes and typically developing children (TDC) utilizing dFC. We hypothesize that dFC encompasses distinctive patterns facilitating a superior discrimination between ADHD subjects and TDC, along with ADHD subtypes. Within the Skip-Vote-Net framework, skip connections were integrated to increase convergence rates and improve the network’s efficacy. Additionally, we explored the potential enhancement in the network’s generalization capabilities by balancing the data distribution across classes.

Materials and methods

Figure 1 depicts the processing pipeline employed for ADHD classification, comprising three main steps: (i) preprocessing of rs-fMRI data, (ii) dynamic functional connectivity analyses, and (iii) ADHD classification using the proposed Skip-Vote-Net. The Skip-Vote-Net was utilized to categorize each subject into different states, involving: (i) classification of ADHD and TDC children using balanced training data, (ii) classification of ADHD and TDC children using unbalanced training data, (iii) classification of ADHD children into ADHDI and ADHDC, and (iv) classification of subjects into ADHDI, ADHDC, and TDC.

Database

For method development and evaluation, we utilized rs-fMRI data collected at the New York University (NYU) Child Study Center, which is part of the publicly available ADHD-200 repository30. This repository includes several hundred rs-fMRI scans gathered from eight distinct acquisition sites, each operating under its own recording protocols. The local Internal Review Board approved the recording protocols at each site. To maintain consistency and mitigate potential discrepancies related to differences in recording protocols, we included only the training set of the imaging data from the NYU center. The training set comprised rs-fMRI data from 222 participants, including 44 individuals with inattentive ADHD (aged 7–18), 77 individuals with combined ADHD (aged 7–18), 2 individuals with hyperactive ADHD, and 99 typically developing children (aged 7–18). Due to the insufficient number of subjects, the hyperactive ADHD class was excluded from the classification analysis. Additionally, in the NYU database, some participants had two resting-state fMRI scans. To enhance data robustness and improve the generalizability of our model, each scan was considered as a separate subject (see supplementary materials for further details).

The data have been collected using a 3-T Siemens Allegra scanner. Each participant’s data includes a high-resolution T1-weighted (T1w) scan, as well as at least one resting-state eyes closed T2*-weighted (BOLD-weighted) multi-echo scan. The T1w images have been obtained through a magnetization prepared, rapid-acquisition gradient-echo sequence with a voxel size of 1.3 mm×1 mm×1.3 mm. Each subject underwent approximately 6 min of rs-fMRI using a gradient-echo echo planar imaging (EPI) sequence with a repetition time (TR) of 2 s and a voxel size of 3 mm×3 mm×4 mm. We analyzed 176 functional volumes from each subject to match data quantity consistency across all participants30.

Data preprocessing

In the preprocessing step, the T1-weighted images of each subject were initially normalized to the standard structural space template, Montreal Neurological Institute (MNI152)31, employing both affine registration (FLIRT)32,33 and non-linear registration (FNIRT)34 tools. Then, skull stripping was performed using AFNIs 3dSkullStrip, followed by automated tissue-type segmentation applying FAST35 to obtain a high-resolution mask (1 mm isotropic) for brain tissues consisting of cerebrospinal fluid (CSF), white matter (WM), and gray matter (GM). To match the rs-fMRI data, low-resolution masks (4 mm isotropic) are generated by AFNI36.

For rs-fMRI preprocessing, the first four volumes were initially removed from each subject’s data. Subsequently, AFNI’s 3D DESPIKE was used to eliminate large transient fluctuations based on the median absolute deviation. Slice-timing correction was performed based on the middle slice, using AFNI’s 3dTshift, followed by rigid registration conducted with AFNI’s 3dvolreg to estimate subject movement parameters. The rs-fMRI images were then realigned and co-registered to structural data and normalized into the MNI atlas space31. The images were resliced to 4 mm×4 mm×4 mm voxels and smoothed with a Gaussian kernel with 6 mm full-width at half maximum. A temporal band pass filter with a frequency band of 0.009–0.1 Hz was used for noise attenuation36. Finally, linear detrending was applied to reduce the effects of noise and signal drift.

After preprocessing, 44 out of the 220 participants were excluded from further analysis due to excessive motion artifacts, defined as a displacement of more than 3.0 mm and a rotation index higher than 3.0°. Consequently, 176 individuals were included in the classification analysis, comprising 42 individuals with inattentive ADHD, 66 individuals with combined ADHD, and 68 typically developing children. Since some individuals had two scans, a total of 75 scans from individuals with inattentive ADHD, 91 scans from individuals with combined ADHD, and 99 scans from typically developing children were used for training and performance evaluation.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Processing pipeline used for ADHD classification using Skip-Vote-Net, including rs-fMRI preprocessing, dynamic connectivity analysis (P represents the number of parcels), and ADHD/TDC classification using Skip-Vote-Net.

Static and dynamic functional connectivity analysis

For the functional connectivity analysis, we utilized the Automated Anatomical Labeling (AAL) atlas37 to partition the brain into 116 regions-of-interest (ROIs). The mean time series of each region were extracted by averaging the preprocessed data of all voxels within the region. For the sFC analysis, the connectivity matrix was computed over the entire scan duration (herein 6 min) by computing the pair-wise Pearson correlation coefficients between the time courses of the brain regions. To construct the dFC matrix38 from the preprocessed rs-fMRI data, we employed a sliding window with a step size to partition the rs-fMRI data into K segments. K was chosen as [5, 7, or 9] by adjusting the window length with a 50% overlap. For instance, with seven segments, the window length was set to 44 TR (88s) with a step size of 22 TR (44s). Subsequently, K dFC matrices were generated for each subject and utilized to assess the performance of the Skip-Vote-Net.

Proposed Skip-Vote-Net

The proposed Skip-Vote-Net architecture was designed to extract the most relevant features. Figure 2 illustrates the Skip-Vote-Net, which is composed of three main components: convolutional, fully connected, and majority voting layers. Skip connections were used in the proposed model to increase the convergence rate by allowing gradients to flow more easily through the network. These connections also enabled the network to operate at its optimal level, resulting in improved efficiency and accuracy. The term “Vote” refers to the majority voting strategy used to enhance the accuracy of the proposed model at the subject level by combining the predictions from subjects’ segments.

The convolutional layers serve as the feature extraction step and were primarily comprised of four Skip-Blocks and a concatenation layer. The first three Skip-Blocks sequentially extract features, and the output features of each one were concatenated after passing through a max-pooling layer to reduce the size of the feature maps. The skip connections as illustrated in Fig. 3 were employed to increase the convergence rate and enhance the network’s overall performance39,40.

Within the convolutional network, the fully connected layer plays a crucial role in determining the final classification outcome for each segment. This layer takes the output features of the last pooling layer to make a prediction. Since there were K dFC matrices for each subject, the predicted classification labels were used to classify the subject into one class through the majority voting strategy. We utilized the hyperbolic tangent function, \(\:\sigma\:\left(x\right)=tanh\left(x\right)\), as the activation function in each layer to incorporate both positive and negative correlation values in the classification process.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Skip-Vote-Net architecture. P represents the number of parcels (i.e. 116, resulting in floor(\(\:\frac{P}{3}\)) = 38), and \(\:x\) was set to 4.

Skip-Block

Figure 3 illustrates the architecture of the Skip-Block, comprised of a detail path, an approximation path, and an element-wise sum component. The detail path augments the number of feature maps while preserving a combination that accentuates the key feature elements of the original input. Meanwhile, the approximation path enables the network to extract primary features, incorporating neighboring information. The incorporation of element-wise summation into these paths enhances the network’s ability to capture discriminative features.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Skip-Block Architecture. F and P represent the number of filters and the number of parcels (herein 116), respectively.

Max pooling, skip connection and concatenation

MaxPooling and skip connections (Fig. 4) can have a significant impact on the accuracy of the model. In the proposed Skip-Vote-Net, MaxPooling was used to down-sample feature maps and reduce the spatial dimensions of the input features, thereby decreasing the number of parameters and computation40,41. Skip-Blocks 1 to 3 were downsampled using MaxPooling with a kernel size of \(\:(3\times\:3)\), and the output feature maps of Skip-Block 4 were downsampled with a kernel size of (\(\:4\times\:4\)).

Skip connections provide alternative pathways for gradients to flow through the network during backpropagation, helping mitigate the vanishing gradient problem. This problem arises when gradients become increasingly small as they propagate through many layers, making it difficult for the network to effectively learn discriminative features42. Furthermore, skip connections enable the training of deeper networks. In the proposed architecture, both max pooling and skip connections were utilized in the first three Skip-Blocks.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Max pooling, skip connection and Concatenation layers. F and P represent the number of filters and the number of parcels (herein 116, resulting in floor(\(\:\frac{P}{3}\)) = 38), respectively.

Majority voting

The majority voting strategy was employed to classify each subject into either ADHD or TDC in the two-class mode, based on the majority of predicted labels assigned by the fully connected layer for the subject’s segments. In the 3-class mode, each subject was initially assigned ADHD or TDC based on the majority voting. If the assigned class was ADHD, then, using the same majority voting approach, the subtype class was determined between ADHDI and ADHDC.

Model validation

The proposed Skip-Vote-Net architecture was evaluated on four classification modes: (i) ADHD/TDC with unbalanced folds, (ii) ADHD/TDC with balanced folds, (iii) ADHDI/ADHDC, and (iv) ADHDI/ADHDC/TDC. For this purpose, the dataset was randomly partitioned into five subsets, both balanced and unbalanced, to conduct five-fold cross-validation (refer to Table 1). In the balanced folds, subjects were randomly selected to ensure an equal number of subjects for ADHD/TDC groups. This methodology enabled us to train our model on four subsets and evaluate its performance on the remaining subset. The average performance was then assessed for each mode (see Supplementary Materials for further details on the subjects selected for each fold and classification mode).

As described in Sect. 2.2, for data augmentation, both scans were included for individuals with two scans. In the “Augmented Data Mode”, there were instances where scans from the same individuals were distributed between the training and test sets. This could potentially introduce data leakage during performance evaluation due to the dependency between scans of the same subjects. To assess the impact of this leakage on the performance of the SVM and Skip-Vote-Net models, we conducted additional evaluations. Specifically, we removed the second scans from the test set in each fold, referred to as the “Baseline Data Mode”, to mitigate the potential leakage effect.

Table 1 Number of ADHD and TDC subjects in each fold for each classification mode (ADHDI: I, ADHDC: C).

Network implementation

TensorFlow was used to implement the Skip-Vote-Net model, employing cross-entropy as the loss function for the two-class classification mode as:

$$\:{L}_{2-class}=\frac{1}{N}\sum\limits_{i=1}^{N}-\left[{y}_{i}log\left({\stackrel{\prime }{y}}_{i}\right)+\left(1-{y}_{i}\right)log\left(1-{\stackrel{\prime }{y}}_{i}\right)\right]$$
(1)

where N was the number of subjects in training, \(\:{y}_{i}\) denotes the true label for the \(\:{i}_{th}\) training subject (0 for ADHD and 1 for TDC), and \(\:{\stackrel{\prime }{y}}_{i}\) represents the prediction made by Skip-Vote-Net. The cross-entropy loss function evaluates the class prediction of individuals by comparing the probability distribution of the actual and predicted labels. When the distributions are similar, the value of the cross-entropy is small; conversely, it is large when they differ significantly. The three-class classification was performed by utilizing a cross-entropy loss as follows:

$$\:{L}_{3-class}=\frac{-1}{N}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{M}{y}_{ij}log\left({p}_{ij}\right)$$
(2)

where M was set to 3, \(\:{y}_{ij}\)​ represents the true label of the \(\:{i}_{th}\)​ training subject belonging to the \(\:{j}_{th}\)​ class, and \(\:{p}_{ij}\)​ is the probability of the \(\:{ij}_{th}\) subject. This allowed us to effectively handle the multi-class classification task and optimize the model’s performance across all classes. Additionally, we utilized the Adam optimizer, known for its efficiency in optimizing DL models, to further enhance the training process and improve convergence.

For each classification mode, the batch size, learning rate, and the maximum number of epochs were set to 8, 10− 5, and 100, respectively. Additionally, we employed the following strategies to mitigate overfitting: (i) early stopping with a patience parameter of 10, monitored based on validation accuracy; (ii) a dropout rate of 0.3; and (iii) L2 regularization with a penalty value of 10− 3.

Static FC-based classification

To evaluate the efficacy of the dynamic approach, we performed a comparative analysis against standard SVM-based classification applied to sFC. We employed the SVM classifier with an RBF kernel due to its superior efficiency relative to alternative variants. Each subject’s data was represented as a single vector derived from the upper-triangle elements of the sFC matrix. Subsequently, the SVM classifier was evaluated using the same cross-validation folds established in Sect. 2.5. Notably, the Skip-Vote-Net method was not viable for classification with sFC matrices due to its intricate architecture and limited input sample size, preventing network convergence.

Parameter selection

To optimize classification accuracy, we determined the optimal Skip-Vote-Net configuration by varying the number of filters across layers within predefined ranges for the unbalanced two-class (ADHD/TDC) classification. The numbers of filters for the first through fourth Skip-Blocks were selected from the following sets: [32, 64, 128, 256], [64, 128, 256, 512], or [128, 256, 512, 1024]. We employed a five-fold cross-validation strategy on segments, utilizing all dFC segments from subjects selected for model validation (Sect. 2.5) for both training and testing. The optimal network configuration identified through this process was then applied to subject-level classification evaluations, using the majority voting strategy.

Evaluation metrics

The classification performance of the proposed method was evaluated using accuracy (ACC), defined as:

$$\:ACC=\frac{TP+TN}{TP+TN+FP+FN}$$
(3)

where a true positive (TP) refers to a correct classification as ADHD, a false positive (FP) indicates a wrongful classification as ADHD, a true negative (TN) signifies a correct classification as TDC, and a false negative (FN) denotes a wrongful classification as TDC.

We further compared the performance of the proposed Skip-Vote-Net model with that of the SVM with an RBF kernel at the segment level using dFC segments and at the subject level, employing the majority voting strategy described in Sect. 2.4.3.

Results

Static FC-based classification performance

Table 2 presents the classification accuracy based on sFC using SVM for both the augmented and baseline data modes. As shown in the augmented data mode, the mean accuracy for the ADHD/TDC classification increased from 63% in the balanced mode to 68% in the unbalanced mode, highlighting the inefficacy of the balancing strategy in enhancing performance. Moreover, the classification accuracy for distinguishing the ADHD subtypes was 55.15%, indicating a performance close to chance level. Furthermore, in the three-class classification mode, the overall mean accuracy was 43.4%, underscoring the challenges in effectively distinguishing between the ADHD subtypes and TDCs.

In the baseline data mode, within the two-class classification scenarios, the mean accuracy in the balanced mode decreased by 4%. The observed reduction indicates that the performance in the augmented data mode was likely inflated due to leakage (ranging from 8.1 to 18.4%) between the training and test sets. In the remaining scenarios, while the decrease in mean accuracy was less pronounced, there was a significant increase in the cross-fold standard deviation in accuracy. In the three-class scenario, the mean accuracy for the sFC-based classification using SVM dropped by 5%, highlighting the performance bias potentially introduced by leakage arising from the dependency between scans of the same individuals.

Table 2 Mean classification accuracy (%) for sFC-based classification using SVM.

The confusion matrices illustrating the classification results are presented in Table 3 for the augmented data mode. Notably, in the initial two-class classification scenarios, the classifiers exhibited a tendency to misclassify more TDC as having ADHD, indicating a directional bias. In the classification between ADHDI and ADHDC, a notable pattern emerged wherein a higher frequency of misclassifications occurred from ADHDC to ADHDI. This observation suggests a potential underlying connection between common features associated with inattention present in both classes.

Table 3 Confusion matrix for sFC-based classification using SVM for the augmented data mode.

Network configuration parameters

Table 4 illustrates the impact of varying the number of dFC segments on classification accuracy in the unbalanced two-class mode for the augmented data mode. The performance evaluation was conducted utilizing the five-fold cross-validation strategy at the segment level, wherein all dFC segments from both the training and test subjects selected for the unbalanced mode (Sect. 2.5) were included. At this step, the number of filters was configured as 64, 128, 256, and 512 for the first to fourth Skip-Blocks, respectively. As illustrated in the Table 4, the highest accuracy was achieved with 7 dFC segments. As shown, classification accuracy of dFC segments for ADHD vs. TDC decreased approximately 4% compared to the other two cases.

Table 4 Effect of the number of dFC segments on dFC-based classification accuracy for the unbalanced two-class ADHD/TDC mode at the segment level for the augmented data mode.

Table 5 illustrates the impact of varying the number of filters in each Skip-Block on classification accuracy in the unbalanced two-class mode. With the optimal number of segments set to 7 (44×TR), the Skip-Vote-Net model achieved its highest mean classification accuracy of 85.87% when utilizing 64, 128, 256, and 512 filters in successive layers within the Skip-Blocks. Alternative configurations yielded suboptimal results. Notably, excluding the detail paths and skip connections resulted in a mean accuracy reduction of 3% and 8%, respectively, emphasizing the significance of these components in the classification process.

Table 5 Effect of the number of the Skip-Vote-Net configuration on dFC-based classification accuracy for the unbalanced two-class mode at the segment level for the augmented data mode.

Segment-level classification performance

Tables 6 and 7 present the mean accuracy and confusion matrices for the Skip-Vote-Net and SVM on dFC segments (7 segments per subject). In the augmented data mode, each segment was assigned a class label by both classifiers at the segment level. The results indicate that SVM achieved a maximum mean accuracy of 68.46% in the unbalanced two-class (ADHD-TDC) mode. In comparison, the Skip-Vote-Net significantly enhanced classification accuracy by 30% in the two-class mode. For the three-class mode, the Skip-Vote-Net achieved a classification accuracy of 85.55%, representing a 39% improvement over SVM. In this data mode, the maximum misclassification rates were observed for Skip-Vote-Net when distinguishing between ADHD and TDC, as well as between ADHDI and ADHDC. Overall, SVM exhibited significantly higher misclassification rates between the ADHD subtypes and TDC compared to Skip-Vote-Net at the segment level.

In the baseline data mode, the accuracy of the SVM classifier decreased across all classification modes at the segment level, with a maximum mean accuracy reduction of 4.8% observed in the three-class classification mode, where there were also greater variations in accuracy across folds. This decline was attributed to leakage between the training and test sets. In contrast, the Skip-Vote-Net demonstrated a maximum reduction in cross-fold mean accuracy of less than 1.6% in the two-class (ADHDI - ADHDC) mode, indicating its greater robustness to leakage.

Table 6 Mean accuracy (%) for dFC-based classification using the Skip-Vote-Net model and SVM at the segment level.
Table 7 Confusion matrix for dFC-based classification (K = 7) using the Skip-Vote-Net model and SVM at the segment level for the augmented data mode.

Subject-level classification performance

Tables 8 and 9 present the mean accuracy values and confusion matrices for Skip-Vote-Net and SVM at the subject level. These results were obtained by setting the optimal number of dFC segments to 7 for the majority voting stage and configuring the number of filters in the first to fourth Skip-Blocks to 64, 128, 256, and 512, respectively, based on the best-performing setup.

As demonstrated for the augmented data mode, the performance of SVM improved by up to 10% in the two-class and three-class modes using the dFC approach compared to the sFC-based classification (Table 2). However, the mean classification accuracy for the three-class mode did not exceed 53.21% at the subject level. In contrast, Skip-Vote-Net outperformed SVM using both sFC and dFC analyses. The proposed model achieved an average accuracy of up to 99.39% in distinguishing between ADHD subtypes, with an overall accuracy of 98.86% in the three-class mode. Additionally, no significant differences were observed between balanced and unbalanced classification modes using the proposed model.

In the baseline data mode, similar to the segment-level results, the accuracy of the SVM decreased significantly by nearly 10% in the three-class mode for dFC-based classification. This reduction was accompanied by greater variability in accuracy across folds, attributed to leakage at the subject level. In contrast, Skip-Vote-Net exhibited only a minimal reduction in cross-fold mean accuracy (up to 1.7%) coupled with higher robustness (indicated by lower variation in performance across folds) across all classification modes.

As illustrated in Table 9, the misclassification rates for SVM were substantially higher between ADHD and TDC, as well as between ADHDI and ADHDC. For the Skip-Vote-Net model, the maximum misclassification rate from the ADHD group to TDC was limited to 4% in the balanced mode, whereas SVM exhibited a misclassification rate of approximately 33%. This indicates that Skip-Vote-Net reduced the misclassification rate by 29% compared to SVM.

Table 8 Mean accuracy (%) for dFC-based classification using the Skip-Vote-Net model and SVM at the subject level.
Table 9 Confusion matrix for dFC-based classification using the Skip-Vote-Net model and SVM at the subject level for the augmented data mode.

Discussion

Alterations in brain functional connectivity patterns among children with ADHD, as compared to typically developing children, have been a subject of investigation in brain connectome research4. Prior studies have found evidence suggesting that deviations in resting-state brain FC observed in children with ADHD can be utilized to effectively differentiate them from their typically developing peers43. In this study, we introduced Skip-Vote-Net, a DL-based architecture used for the classification of ADHD/TDC children using dFC patterns derived from rs-fMRI data. Our results showed that the Skip-Blocks enhanced the network ability to extract discriminative features using two distinct paths, which were subsequently fused to accentuate prominent discriminative characteristics, thereby facilitating accurate feature learning within the network. Moreover, the inclusion of the skip connections empowered the network with varying resolutions of information, ensuring the preservation of critical information and gradients that might otherwise diminish with increasing the network depth. In the final layer of the Skip-Vote-Net, a voting mechanism was employed to help the network in classifying each subject based on the majority class of segments. This approach served to enhance the robustness of the proposed method, particularly in scenarios involving misclassification of segments.

Several studies (Table 10) have documented an average accuracy of 73% through various methodologies in the classification of ADHD/TDC utilizing sFC analysis on the NYU Dataset8,18,43,44,45,46. The prevalent method for evaluating classification performance has been the hold-out strategy (216 in the training set and 41 in the test set) employing the AAL atlas with 90 regions. In our research, employing SVM, we achieved a mean accuracy of 68% (± 4%) via five-fold cross-validation over 216 subjects utilizing the AAL116 atlas. Although DL-based methodologies have demonstrated superior accuracy compared to SVM when utilizing sFC, they have typically been evaluated over a single fold. Conversely, in our SVM-based approach, the accuracy reached 72%, highlighting the performance dependence on the dataset.

Several studies (Table 10) have demonstrated incremental advancements in classification accuracy, generally falling within the range of 73% to 76%, employing dFC analysis and DL methodologies43,44,45. In our study, we significantly advanced upon these results by employing connectivity strengths from FC matrices divided into seven segments per subject. As demonstrated, the SVM performance improved by up to 10% in the two-class and three-class modes using the dFC approach compared to the sFC-based classification. However, the mean classification accuracy achieved by SVM for the three-class mode did not exceed 53.21% at the subject level. By integrating Skip-Vote-Net and employing majority voting with a five-fold cross-validation strategy, we achieved a mean accuracy of 97.73% at the subject level for the unbalanced two-class (ADHD/TDC) case. However, when altering the number of segments per subject, specifically utilizing five or nine segments based on common window lengths used in dFC analysis (typically ranging from 30 to 240 s)47,48, our accuracy decreased by 4%. Furthermore, our findings indicated no substantial performance improvement when balancing the number of samples across classes, aligning with previous studies that utilized ensemble sampling techniques for data balancing [50]. The Skip-Vote-Net, grounded in dFC analysis, notably decreased misclassification rates compared to sFC-based SVM, primarily due to majority voting on dynamic segments. Additionally, we observed that increasing the network size, as indicated by the number of filters per Skip-Block, enhanced the classification accuracy by up to 5%. However, further increasing network complexity led to a 2% decline in performance. The ablation experiments highlighted the importance of detailed paths and skip connections, contributing to a 3.6% and 8.1% improvement, respectively, in dFC-segment classification accuracy. Importantly, our approach uniquely focused on not only accurately distinguishing between ADHD and TDC children but also effectively classifying ADHD subtypes, achieving an exceptional mean accuracy of 98.86%. This marks a significant advancement in the field and underscores the potential of our method for precise clinical characterization of ADHD.

In this study, several limitations have been identified that should be addressed in future research. Initially, the evaluation of the proposed method was restricted to classifying subjects exclusively from the NYU dataset. The performance of the SVM and Skip-Vote-Net models was assessed using a data augmentation strategy involving two rs-fMRI scans for some participants - one for training and one for testing. This approach introduced a risk of data leakage between the training and test sets. Our results indicated a significant decrease in accuracy for the SVM model, while the Skip-Vote-Net model exhibited greater resilience to this leakage. However, the exclusion of the second scans from the test sets to mitigate data leakage led to increased cross-fold variability in accuracy due to the reduced number of scans in the test sets. To achieve a more comprehensive evaluation of the Skip-Vote-Net model’s performance on unseen data, it is essential to extend the analysis to include datasets from multiple centers within the ADHD-200 database. This broader approach will further allow the investigation of how different acquisition settings and parameters affect classification accuracy, thereby providing a more robust assessment of the model’s generalizability and effectiveness. Furthermore, the application of the Skip-Vote-Net method for classification with sFC matrices presented challenges due to its complex architecture, which resulted in convergence issues likely attributed to the limited sample sizes. Therefore, additional optimization efforts are required to refine the model’s configuration for improved performance using sFC features. Additionally, we utilized a low-resolution atlas comprising only 116 regions, which may affect classification accuracy. Recent studies4,49 have underscored the significance of functional parcellation at different scales, highlighting the potential influence on classification accuracy. Moreover, it has been demonstrated that high-resolution functional parcellation holds promise for capturing changes in connection strengths, particularly among smaller-sized brain structures46.

Table 10 Comparative accuracy analysis for two-class classification: ADHD vs. TDC using NYU dataset from ADHD-200 database.

Conclusion

In conclusion, our study introduced Skip-Vote-Net, a deep learning-based architecture tailored for the classification of ADHD/TDC children using dFC patterns derived from rs-fMRI data. Our findings demonstrated the efficacy of Skip-Blocks in enhancing the network’s capability to extract discriminative features by employing two distinct paths, fused to emphasize salient discriminative characteristics. Furthermore, the integration of skip connections empowered the network with multi-resolution information, preserving crucial gradients and information that might diminish with increasing network depth. In the final layer of Skip-Vote-Net, a voting mechanism was deployed to aid the network in subject classification. Further investigation into the performance and generalization of Skip-Vote-Net across diverse datasets and clinical populations is warranted to fully assess its potential as a robust diagnostic tool in neuroimaging research.