Deep learning-based differential diagnosis of major depression and bipolar disorder using microglia-cellular sensors and patient-derived small extracellular vesicles

Zambrano, Jorge; Luarte, Alejandro; Contreras, Julian; Perez, Juan P.; Yantén-Fuentes, Liliana; Prieto, Miguel; Lazcano, Pablo; Wyneken, Ursula; Perez, Claudio A.

doi:10.1038/s41598-026-47476-9

Download PDF

Article
Open access
Published: 07 April 2026

Deep learning-based differential diagnosis of major depression and bipolar disorder using microglia-cellular sensors and patient-derived small extracellular vesicles

Jorge Zambrano^1,2^na1,
Alejandro Luarte^2,3^na1,
Julian Contreras^1,2,
Juan P. Perez^1,2,
Liliana Yantén-Fuentes^2,3,
Miguel Prieto^3,4,
Pablo Lazcano^2,3,
Ursula Wyneken^2,3 &
…
Claudio A. Perez^1,2

Scientific Reports volume 16, Article number: 11679 (2026) Cite this article

1073 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

The diagnosis of major depressive disorder (MDD) and bipolar disorder (BD) relies on symptom-based evaluations. Both MDD as well as BD present episodes of depressed mood, often leading to misdiagnosis and treatment delays. Our study presents a novel deep learning–based diagnostic approach that employs microglial cells as biosensors to identify disease-specific image features induced by patient-derived plasma small extracellular vesicles (sEVs), enabling differentiation among MDD, BD, and control (CTRL) groups. Microglial morphological changes in response to plasma sEVs were captured using fluorescence microscopy. Individual cell images were grouped into structured M×M arrays and processed through a DenseNet121 convolutional neural network (CNN). To enhance classification robustness, P image arrays per subject were generated using random cell image permutations and affine transformations. Final diagnoses were assigned through weighted voting across all arrays. Model performance was assessed using repeated subject-disjoint random splits. The CNN-based image analysis framework accurately distinguished between MDD, BD, and CTRL subjects. The best model configuration correctly classified 44 out of 45 individuals across the five cross-validations. By combining deep learning and microglial cell-based biosensing, our results support the proof-of-concept for building a novel diagnostic platform for the differential diagnosis between MDD and BD.

Introduction

Mental health disorders are highly prevalent and associated with increased disability and mortality worldwide¹. Mood disorders encompass two main groups: major depressive disorder (MDD, also referred to as unipolar depression), and bipolar disorders (BD), that place a substantial burden on the healthcare systems². MDD is a chronic mental illness that is characterized by recurrent episodes of depressed mood and anhedonia³. In contrast, BD alternates between periods of depression, abnormal euphoria, and irritability. MDD and BD have been recognized as major contributors to disability across various cohorts^2,4. Worldwide, the lifetime prevalence of MDD is 11.1–14.6%, while for BD, it reaches 3.4%^1,5.

Unfortunately, the clinical profiles of depressive episodes in a patient with MDD or with BD are indistinguishable, often leading to misdiagnosis and delayed treatment. Furthermore, it has been shown that up to 69% of BD patients could have been initially misdiagnosed with MDD, while more than 30% may remain misdiagnosed for up to 10 years⁶. This results in significant disability and high economic costs^6,7 due to an incorrect selection of pharmacological treatment (i.e., use of antidepressant monotherapy, or lack of treatment with mood stabilizers), which may increase treatment-emergent manic episodes, cause poor illness course, poor treatment response, low study/work productivity, low quality of life, and high societal costs^8,9,10,11. Currently, BD and MDD diagnosis is based on clinical criteria and symptoms reported by patients, with laboratory tests or brain imaging used only to rule out other neurological disorders. Therefore, great efforts have been made to include biological measurements that improve the differential BD/MDD diagnosis.

Biomarkers (biological parameters that are associated with disease states or traits) arise as potential aids for the differential diagnosis of depression. In searching for such biomarkers, small extracellular vesicles (sEVs) have emerged as ideal sources of brain disease markers, because brain cell-released sEVs have been shown to reach the bloodstream and may provide a significant opportunity to study psychiatric disorders more precisely. Thus, sEVs derived from plasma could reveal neuroinflammation associated with psychiatric disorders¹². sEVs are nanometric lipid structures (50–200 nm) released into the extracellular space by nearly all cell types. They carry a complex molecular cargo consisting of various soluble and transmembrane proteins, genetic material, and lipids^13,14. Evidence suggests that plasma sEVs play a crucial role in the brain and systemic inflammatory response¹⁵. Microglial cells are critical decoders of neuroinflammation, which is closely associated with clear changes in their secretory and morphological responses^16,17. Therefore, we reasoned that sophisticated computer vision strategies could provide a robust approach for discriminating the microglial cellular response to patient-derived sEVs.

Deep Learning (DL) models, such as Convolutional Neural Networks (CNNs), have been employed effectively to solve various computer vision problems. These CNN models can be trained using high-dimensional data to identify distinguishing features among various disorders^18,19,20,21. Applications of CNN models in complex pattern recognition have been developed in image classification^22,23, image segmentation^24,25, face recognition^26,27, iris recognition^28,29 and other applications. These models extract and learn complex features successfully from visual data, allowing them to achieve high accuracy (ACC) in previously challenging image analysis tasks. By leveraging the ability of CNNs to process high-dimensional data, they have also been applied in biomedical fields to analyze medical images, either in classification^30,31,32, segmentation^33,34,35, or regression tasks^36,37,38, supporting expert diagnoses^{3,39,40,41,42,43}. Once trained, CNNs could assist in reaching early and accurate diagnoses, reducing the dependence on symptom-based assessments, and minimizing the risk of misdiagnosis^44,45,46. Despite these advances, most current applications in biomedical imaging mainly focus on isolated features or individual units of analysis, which may limit the capture of biologically relevant heterogeneity. This is particularly true in cellular systems, where pathological states often manifest not through discrete changes in single cells, but as emergent patterns across populations. In such cases, approaches that can integrate spatial or contextual information across multiple cells can provide a more comprehensive and robust diagnostic signal. Addressing this gap could enhance the accuracy and generalization capacity of image-based classification significantly in complex biological scenarios, especially when subtle morphological shifts are involved^47,48.

The diagnosis of mood disorders, particularly major depressive disorder (MDD) and bipolar disorder (BD), remains a significant challenge due to overlapping symptomatology and reliance on subjective clinical evaluation. Recent advances in biomedical informatics have leveraged artificial intelligence to develop objective, data-driven diagnostic tools capable of capturing neurophysiological, behavioral, and linguistic signatures of these disorders. Neuroimaging-based studies continue to report important generalization challenges, as demonstrated by Belov et al.⁴⁹, who conducted the largest multi-site brain-imaging analysis of major depressive disorder (MDD) and found that traditional machine-learning models achieved only moderate balanced accuracy (~ 62%) before dropping to near-chance performance once site effects were controlled. Yang et al.⁵⁰ used resting-state fMRI to compare brain functional efficiency across schizophrenia, bipolar disorder, and major depressive disorder. They found shared sensorimotor disruptions between schizophrenia and MDD, linked to genes involved in glutamatergic and calcium/cAMP signaling, suggesting standard neurobiological mechanisms among these disorders.

Electrophysiological studies using EEG have demonstrated consistently stronger performance. Zhao et al.⁵¹ introduced the SE-1DCNN-LSTM framework for distinguishing MDD from bipolar disorder (BD), reporting accuracies of 81.10% at the epoch level and 83.16% at the subject level. Hata et al.⁵² evaluated a transformer-based deep-learning model trained on frequency-domain features extracted from portable EEG recordings, achieving a balanced accuracy of 80.8% and an AUC of 0.872 in differentiating healthy volunteers from patients with dementia-related conditions. Subgroup analyses across diagnostic categories and severity levels yielded AUCs ranging from 0.812 to 0.898 with balanced accuracies of up to 86.4%. Anik et al.⁵³ proposed an 11-layer Ex-1DCNN that identified Gamma-band activity in 15-second epochs as a highly discriminative biomarker, achieving 99.6% accuracy for depression detection, highlighting the potential of short-window, non-invasive electrophysiological screening. Liu et al.⁵⁴ explored frontal resting-state EEG for distinguishing generalized anxiety disorder (GAD) from healthy controls, introducing a “Differential Channel” method and connectivity-based features to enhance anxiety-related signal discrimination. Using these representations, the Deep Forest classifier achieved an accuracy of up to 98.08% with short time windows, supporting the feasibility of frontal-channel EEG and functional connectivity metrics for reliable GAD identification.

Beyond physiological signals, voice and text-based machine learning models have shown equally strong performance. Huang et al.⁵⁵ applied a pre-trained wav2vec 2.0 model to acoustic recordings, achieving approximately 96% accuracy in binary depression classification, while Xu et al.⁵⁶ demonstrated that fusing text and voice embeddings (via BERT and Wav2Vec) within a CNN-BiLSTM architecture substantially outperformed unimodal voice or text models.

Parallel advances have emerged in social-media text analysis, where Ding et al.⁵⁷ compared classical machine-learning methods (logistic regression, random forests, LightGBM) with deep learning approaches (ALBERT, GRU), reporting comparable performance. Their results suggest that classical models maintain interpretability advantages, while deep architectures capture more complex linguistic patterns. Additionally, Li et al.⁵⁸ provided a comprehensive systematic review of 65 studies that combined audio and text data for automated depression detection, highlighting the growing relevance of natural language and paralinguistic features in mental health diagnostics.

Additional progress has been made using wearable sensor and behavioral data. Ricka et al.⁵⁹ identified a stable physiological signature of depression using cardiac and electrodermal activity, enabling daily mood prediction with ~ 86% accuracy. Saad et al.⁶⁰ transformed smartwatch motor-activity time series into Markov transition-field images for analysis via an attention-based CNN, achieving ~ 95% accuracy in the depression class. Similarly, Wu et al.⁶¹ demonstrated that digital biomarkers-heart rate, sleep, and activity can predict bipolar mood states, reaching 83% accuracy for depressive symptoms and 91% for manic symptoms, supporting the feasibility of continuous mood-state monitoring. Psychometric information represents another modality with strong predictive value. Using DASS-42 scores together with demographic features, ShamsEldin et al.⁶² reported SVM-based classification accuracies above 98% across depression, anxiety, and stress categories.

Our contributions

In this study, we propose a proof-of-concept for a new diagnostic technology. First, we introduce a non-invasive strategy that uses sEVs to modulate microglial morphology for the classification of MDD, BD, and CTRL subjects. Second, we develop a DL-based image analysis pipeline that achieves high diagnostic accuracy from microglial cell morphology. Third, we propose a structured array-based image organization that enables spatially enriched image classification, and facilitates data augmentation through cell image permutation, flipping, and rotation. Finally, we address challenges posed by variable data quality and sample imbalance by generating multiple augmented image arrays per subject. This framework offers a powerful and scalable approach that integrates biological signal amplification with context-aware DL, paving the way for precision diagnostics in psychiatry, and beyond.

Materials and methods

Patients

Participants with bipolar disorder (BD) and major depressive disorder (MDD), as well as healthy control participants (CTRL), were recruited at Clínica Universidad de los Andes by a psychiatrist with expertise in mood disorders. The study was approved by the Comité Ético-Científico of Universidad de los Andes (approval #CEC201975). All procedures and methods were performed in accordance with relevant institutional and national guidelines and regulations, and in compliance with the Declaration of Helsinki. Written informed consent was obtained from all participants. Each group included 15 participants (BD, MDD, and CTRL; total n = 45). Inclusion and exclusion criteria are detailed in Supplementary Materials S1, and participant demographic and clinical characteristics are provided in Table S1.

Animals

Sprague-Dawley rats were acquired from the Animal Facility of Pontificia Universidad Católica de Chile, Santiago, Chile. All animal procedures and methods were conducted in accordance with the ARRIVE guidelines and with institutional regulations for the care and use of laboratory animals, and were approved by the Universidad de los Andes Bioethical Committee (approval #CEC202039).

Microglial cell culture

Postnatal day 1–2 (P1–P2) rat pups were euthanized without anesthesia by rapid decapitation using sharp scissors for immediate brain tissue collection in accordance with institutional guidance for neonatal euthanasia. Euthanasia was performed by trained personnel in accordance with institutional regulations and international guidelines. Mixed glial cells were isolated from the telencephalic portion of a 1–2 day-old Sprague–Dawley rat brain as previously described in⁶³ and seeded in a 100 mm treated plate (1 whole brain per plate). After 14 days, confluent mixed glial cultures were gently swirled for 60 s in a clockwise and an anticlockwise manner, as previously described⁶⁴, to obtain a pure microglial suspension. Next, 10,000 to 20,000 cells were seeded in 96-well microscopy plates, (Falcon, 353219), previously treated with 0.1 mg/ml poly-l-lysine (Sigma). After 48 h, each well was treated for 24 h with 12 µg of protein from the corresponding patient-derived plasma EVs.

Extracellular vesicle isolation

To obtain patient plasma, 15–20 mL of blood were collected, centrifuged, and subjected to a Ficoll^® gradient by mixing 4 mL of blood with 4 mL of Ficoll^®. The mixture was subsequently centrifuged at 400 RCF for 45 min at 4 °C without applying a brake. The plasma remained in the upper phase. Next, the plasma was centrifuged at 2000 RCF for 30 min at 4 °C, using 500 µL per tube. The resulting supernatant was centrifuged again at 10,000 RCF for 40 min at 4 °C. Then, 480 µL of the supernatant were collected and incubated with 240 µL of a commercial kit (Total Exosome Isolation Kit, Invitrogen, #4478359) for 16 h under rotary agitation at 4 °C. Finally, the sample was centrifuged at 10,000 RCF for 1 h at 4 °C, and the pellet, corresponding to sEVs, was resuspended in 500 µL of sterile PBS. This sEV fraction displayed a particle size distribution (in nm) and contained molecular markers (such as flotillin and CD-63) as expected for extracellular vesicles. The size distribution was determined by nanoparticle tracking analysis (NTA), while molecular markers were detected by Western blots as described in⁶⁵ and shown in Supplementary Materials, Fig. S1.

Immunofluorescence

Treated cells were fixed with a solution of 4% paraformaldehyde (PFA) plus 4% sucrose for 20 min at room temperature, then washed twice with PBS containing 0.5% BSA. Blocking was performed for 30 min at room temperature with 5% BSA, followed by two washes with PBS containing 0.5% BSA, and finally, the cells were permeabilized with 0.3% Triton X-100. Cells were stained for nuclear markers: DAPI (D1306, Invitrogen); microglia-specific cytoplasmic marker: Iba-1 (019-19741, Fujifilm), and β-actin (A5441, Sigma-Aldrich).

Equipment and settings

Nanoparticle tracking analysis (NTA) was performed using the NanoSight NS300 equipment (Malvern Panalytical) coupled to manufacturer software NTA version 3.2. Samples were diluted 1:100 with DPBS right before analyses. Camera level was kept on value 8 and detection threshold was kept on value 3 for all the samples. Western blotting was performed using x-ray films. The films were later scanned, and images were processed using Adobe Photoshop software. Automated microscopy images encoded in 24-bit RGB color space were acquired using a Cellomics Arrayscan XTI microscope (Thermo Fisher Scientific) with a 20× objective (NA 0.4) at a resolution of 1104 × 1104 pixels per channel. Each raw image contains artifacts that were removed to leave only isolated microglial cells. Artifacts included stains, overlapped microglial cells, and microglial cells overlapping with the edge of the image.

Image Preprocessing

Individual cells were identified in the microglial raw images. The red channel from the raw images was used, as it contains most of the information because of the color of the cells. The red channel was binarized using an automatic threshold provided by the non-parametric and unsupervised Otsu method⁶⁶. All blobs connected to the image boundary were removed, since the edge of the image would section the cells. Additionally, we removed small blobs with an area of less than 400 pixels because they do not contain images of microglial cells. Figure 1 shows an example of the blob detection process. Note that blobs 9, 12, and 14 are eliminated because they are sectioned by the edge of the image. After the blob detection process, the centroids of each of the remaining blobs inside the binarized images were computed. These centroids were used to extract the cells into sub-images of 75 × 75 pixels from the raw image, thus obtaining individual cells. Finally, removing stain artifacts (green spots) was necessary as shown in Fig. 2. In addition, Fig. 3 shows the distribution of individual cells for each class. The violin plot represents the density distribution of cell counts, highlighting the spread and concentration of data points. The embedded scatter points indicate individual subjects, while the dashed lines indicate quartiles. Subjects with the highest and lowest cell counts within each class are annotated accordingly. The subject with the largest number of cells belongs to the BD class with the ID 3,650,109 having 248 instances, while subject 1029 of the CTRL class has only 71 instances. The final dataset is described in Table 1.

Table 1 Number of cells available after artifact removal for each class BD, CTRL, and MDD.

Full size table

Proposed Pipeline

The overall pipeline of this research is shown in Fig. 4. Our proposal is that the combined use of CNNs and microglial cells, the latter acting as cellular sensors, can improve diagnostic accuracy. The method begins with the extraction of patient-derived sEVs from their blood plasma. These vesicles are applied to cultured microglia acting as biological sensors capable of exhibiting disease-specific morphological responses. Following this treatment, the microglia are imaged using fluorescence microscopy to capture the morphological features of the microglia cells. These images are then processed to detect and segment individual microglial cells.

Since each microglial cell may not provide enough information to correctly classify the three classes $\:\varOmega\:=\{BD,MDD,CTRL\}$ (|$\:{\Omega\:}|=3$), we developed an alternative method using cells organized into arrays instead of using individual cell instances⁶⁷. In this way, there is a higher probability of showing an individual microglial cell to the classifier that reacted positively to the sEVs from patients. In cases where two or more cell sensors react positively to sEVs within the array, the CNN model will learn common features among examples of the same class⁶⁸. Therefore, rather than analyzing cell images in isolation, the pipeline groups the images into structured arrays of fixed dimensions (M×M cells per array), which serve as input to a CNN based on the DenseNet121²³ architecture. This network was initially pre-trained on the ImageNet dataset⁶⁹ and subsequently fine-tuned to classify microglial arrays into one of three diagnostic categories of $\:{\Omega\:}$: BD, MDD, or CTRL. Additionally, to improve accuracy, the method aggregates predictions from multiple arrays belonging to the same subject by summing the class-specific confidence scores output by the CNN model. The class with the highest cumulative confidence determines the final subject-level diagnosis. This hierarchical strategy, which shifts classification from individual cells to grouped arrays, improves accuracy and generalization.

Training - Testing Protocols and Dataset Generation

The evaluation protocol corresponds to repeated subject-disjoint random splits, also referred to as repeated random subsampling or repeated holdout validation, which has been similarly adopted in other problem settings⁷⁰. Five independent iterations were performed, each constructing a subject-disjoint partition by randomly assigning subjects to training and test sets, stratified by class ($\:{N}_{s}^{train}=10$ subjects per class for training, $\:{N}_{s}^{test}$ = 5 per class for testing), yielding an overall 30/15 training/held-out split per iteration. As is inherent to this procedure, a given subject may appear in the held-out set across multiple iterations or potentially in none. In our implementation, the five random splits were constructed such that every subject appeared in the held-out test set in at least one iteration. Table 2 reports the held-out frequency for each of the 45 subjects across the five iterations. Consequently, the set of cell images used for training, $\:{\phi\:}^{train}$, and testing. $\:{\phi\:}^{test}$, depends on the iteration partition and the selected subjects.

Table 2 Held-out frequency for each of the 45 subjects across the five iterations of the repeated subject-disjoint random splits. Each iteration randomly assigned 10 subjects per class to the training set and 5 per class to the test set. All subjects appeared in the held-out test set in at least one iteration.

Full size table

Each array is created by randomly selecting individual microglial cell images from only one subject, and thus, each array belongs to the class of that subject. Each cell image could be at any location within the array, and therefore, by making cell image permutations, different arrays for the same subject may be generated. We used the letter M to specify the size of the array; for example, Fig. 6 shows arrays for M = 5, 6, and 7, respectively. Each cell image is selected just once in each array. Note that the arrays have equal aspect ratios, and their size depends on M, since individual cell images are 75 × 75 pixels. In Fig. 6, the three arrays are presented with equal size although the number of pixels in each one depends on M.

Data augmentation is performed by permutation of cell image position within each array, and by vertical or horizontal cell image flips, rotations, and rotations with flips. In summary, the affine transformations are the following: vertical flip, horizontal flip, double flip, rotate 90° (counterclockwise), rotate 90° + vertical flip, rotate 90° + horizontal flip, rotate 270° (clockwise).

The weighted voting process is illustrated in Fig. 5. The pipeline proceeds through five stages. In Stage 1, for each subject U, a set of P = 150 image arrays is generated, each consisting of M×M microglial cell images randomly sampled from that subject, with affine transformations applied for augmentation. In Stage 2, each array is resized to 512 × 512 pixels and processed by the fine-tuned DenseNet121 CNN, which generates a confidence vector $\:\pi\:\:=\:[{\pi\:}_{BD},\:{\pi\:}_{MDD},\:{\pi\:}_{CTRL}]$ via softmax activation. In Stage 3, the P confidence vectors generated for subject U are collected in the set $\:{\varPsi\:}_{U}\:=\:\{\pi\:_{1},\:\pi\:_{2},\:\dots\:,\:{\pi\:}_{P}\}$, where each array may yield different confidence levels for each class, reflecting the natural variability across different cell combinations. In Stage 4, all confidence scores are summed for each class across the P arrays according to $\:{S}_{c}\:=\:\varSigma\:\:{\pi\:}_{\left\{i,c\right\}}$ for each c ∈ Ω, accumulating soft probabilities rather than counting hard votes. In Stage 5, the argmax function is applied to the accumulated scores: $\:{\widehat{y}}_{U}=\text{arg}\text{max}\left({S}_{BD},\:{S}_{MDD},\:{S}_{CTRL}\right)$, and the class with the highest accumulated confidence score is assigned as the final predicted label at the subject level. In the illustrated example, a BD subject has accumulated scores of $\:{S}_{BD}$ = 128.4, $\:{S}_{MDD}$ = 14.2, and $\:{S}_{CTRL}$ = 7.4, after summing over P = 150 arrays, resulting in a correct BD classification.

Therefore, when a cell is selected to be part of an array, only one of the above transformations is applied randomly. This allows the subsequent arrays that have chosen the previously chosen cell to place it in a different location through the M dimension of the array. This can also be done with a different transformation, with a probability of 1/8, as illustrated in the example of Fig. 7. The number of arrays $\:\left|{\Phi\:}\right|$ generated for each subject is P, as shown in Fig. 8. In this way, we obtain P arrays for each subject. To increase the variability in the P dimension, we applied affine transformations as data augmentation to the array of selected cells. There are eight possible transformations (position and seven affine transformations). Figure 9 shows two of the affine transformations mentioned above.

The method was tested by changing M from 3 to 10, and P was changed from 100 to 200. M was also used to balance the number of samples from the different subjects. Therefore, the number of arrays $\:\left|{\Phi\:}\right|$ generated for the training and testing datasets in each k-fold cross-validation depends on the P values as follows:

$$\:\left|{{\Phi\:}}^{\text{t}\text{r}\text{a}\text{i}\text{n}}\right|=\left|{\Omega\:}\right|\times\:{N}_{s}^{train}\times\:P,$$

(1)

$$\:\left|{{\Phi\:}}^{\text{t}\text{e}\text{s}\text{t}}\right|=\left|{\Omega\:}\right|\times\:{N}_{s}^{test}\times\:P,$$

(2)

where $\:\left|{\Omega\:}\right|=3$, $\:{N}_{s}^{train}=10$ and $\:{N}_{s}^{test}=5$. The total number of cells in each array is M×M, while the number of cells per subject is M×M×P as seen in Fig. 8.

Convolutional Neural Network Classifier

The proposed pipeline includes a classifier based on a CNN that has been pre-trained using the ImageNet dataset⁶⁹. ImageNet is a large-scale visual database designed for visual object recognition research. It contains millions of labeled images across thousands of categories, making it an essential resource for training and benchmarking DL models²³. The dataset is widely used in computer vision to pre-train models, which are then fine-tuned for specific tasks.

Using the pre-trained weights of a DenseNet121²³, we performed fine-tuning over the cell dataset, deleting the last layer of the model, which contains 1000 neurons, and adding a layer with three neurons corresponding to the classes $\:{\Omega\:}\:\epsilon\:\left\{BD,\:MDD,\:CTRL\right\}$. The resulting model contains 6,956,931 trainable parameters.

The output of the final layer was activated using the softmax function, defined as

$$\:{{\uppi\:}}_{i,c}=\frac{{e}^{{z}_{i,c}}}{\sum\:_{k\epsilon{\Omega\:}}{e}^{{z}_{i,\:k}}},\:$$

(3)

where $\:{z}_{i,\:c}$ denotes the logit of class $\:c$ for image $\:i$, and $\:{\pi\:}_{i,\:c}$ represents the predicted probability for that class. The network was optimized using categorical cross-entropy loss, defined as:

$$\:CE=\:-\sum\:_{i}\sum\:_{c\epsilon{\Omega\:}}{y}_{i,c}\:\bullet\:\text{log}\left({{\uppi\:}}_{\text{i},\:\text{c}}\right),$$

(4)

where $\:{y}_{i,c}$ is the one hot encoded ground truth label.

Experiments and Model Evaluation

We performed two types of experiments. In the first one, the inputs to the classifier were individual cells φ, while in the second, cell arrays Φ were the inputs to the classifier. We used the pre-trained DenseNet121 after evaluating various architectures and determining that some of them yielded similar results, as shown in Table S2, Supplementary Materials. The evaluated architectures were DenseNet121, DenseNet169, DenseNet201²³, ResNet18, ResNet34 and ResNet50 ⁷¹. The performance obtained by DenseNet121 was similar to ResNet34 (p > 0.05); however, the DenseNet121 model contains fewer parameters.

Two distinct categories of hyperparameters were involved in the pipeline, and they were handled differently to guard against optimistic bias. The first category comprises the training hyperparameters of the CNN (learning rate, weight decay, batch size, number of epochs, and learning rate decay schedule). These were determined using only the training partition of the first split (i.e., monitored on a held-out validation subset drawn from the training subjects, not from the test subjects), and then kept fixed for the remaining four splits. Critically, the four remaining test partitions were never exposed to any tuning decision and therefore provide unbiased performance estimates.

The second category comprises the pipeline design parameters M (array size) and P (number of arrays per subject). These were selected using a nested cross-validation strategy to ensure that hyperparameter tuning and model evaluation were performed on strictly separate data. In the inner loop, the 30 training subjects (10 per class) were further divided into 24 subjects for training and 6 for validation, and a 5-fold cross-validation was conducted over all tested combinations (M ∈ {3, 5, 7}, P ∈ {100, 150, 200}). This inner evaluation identified M = 5 and P = 150 as the configuration offering the best balance between accuracy and variance. In the outer loop, the model was retrained using all 30 inner-loop subjects with the selected hyperparameters and evaluated on the remaining 15 held-out test subjects (5 per class), which were never used during hyperparameter selection (full results of the outer-loop evaluation across all tested combinations are reported in Table S4, Supplementary Materials). This nested design prevents information leakage between model selection and performance estimation, ensuring that the reported results reflect an unbiased evaluation of the chosen configuration. The CNN backbone architecture was selected via an inner-loop evaluation conducted prior to the main evaluation pipeline. Multiple architectures were compared (DenseNet121, DenseNet169, DenseNet201, ResNet18, ResNet34, and ResNet50), with full results reported in Table S2 (Supplementary Materials). DenseNet121 was chosen because it achieved comparable performance to the best-performing alternatives (p > 0.05) while requiring fewer parameters.

Finally, the subject-level aggregation strategy (confidence-weighted voting, Eq. 6) is not a tuned hyperparameter but a fixed design choice of the pipeline, applied uniformly across all configurations and folds.

During training, both classifiers (with inputs φ and Φ) were trained with a batch size of 64 for 150 epochs. We employed the Adam optimizer with a momentum of 0.8, and weight decay of 0.0001. Input images of φ were resized to dimensions of 64 × 64 pixels for the individual cell classifier, and 512 × 512 pixels for the array Φ classifier. Initial learning rates were set to 0.0001 and 0.001 for the individual cell and array classifiers, respectively. Additionally, the learning rate was decayed by a factor of 0.8 every 50 epochs. We optimized these hyperparameters to achieve the best performance in both classifiers. The training was performed on two NVIDIA GeForce RTX 3080 Ti GPUs in parallel.

We reported accuracy at two levels. The accuracy for each class in a multiclass classification task is computed as follows:

$$\:ACC\left(y,\:\widehat{y}\right)=\frac{1}{N}\sum\:_{i=0}^{N-1}1\left({y}_{i}={\widehat{y}}_{i}\right),$$

(5)

where $\:N$ represents the total number of samples ($\:\left|{\phi\:}^{test}\right|$ and $\:\left|{{\Phi\:}}^{\text{t}\text{e}\text{s}\text{t}}\right|$ for cell image and array inputs respectively), $\:{y}_{i}$ is the true label for the $\:i-th$ instance, and $\:{\widehat{y}}_{i}$ is the corresponding predicted label. The indicator function $\:1\left({y}_{i}={\widehat{y}}_{i}\right)$ returns 1 if the predicted label matches the true label, and 0 otherwise. The summation counts the total number of correct predictions and dividing this sum by $\:N$ yields the overall accuracy.

The accuracy at the first level assessed the model using individual cell images or array images, considering the labels and predictions for each cell or each array, respectively. At the second level the class of the subject was evaluated instead of individual images from cells or arrays. Therefore, all cells/arrays from that subject voted for the predominant class. We applied weighted voting to determine the predominant class of the subject, as follows:

$$\:{\widehat{y}}_{U}=\:\text{arg}\underset{c\in\:{\Omega\:}}{\text{max}}\left(\sum\:_{i\in\:{\psi\:}_{U}}{{\uppi\:}}_{i,\:c}\:\right)$$

(6)

where $\:U$ denotes each subject in the test partition, $\:{\psi\:}_{U}$ corresponds to the set of cells/arrays associated with the subject U ($\:{\phi\:}^{test}$ and $\:{{\Phi\:}}^{\text{t}\text{e}\text{s}\text{t}}$ respectively), and $\:{{\uppi\:}}_{i,\:c}$ represents the model confidence for class $\:c$ obtained from cell/array $\:i$. The final subject label $\:{\widehat{y}}_{U}$ thus corresponds to the class with the highest accumulated confidence across all its associated inputs.

Results

We first assessed the potential of the morphology of individual cellular sensors for disease classification; for this, we trained a classifier with isolated microglia cell images $\:\phi\:$. Performance across folds remained consistently low, with accuracy values under 60% (Fig. 11a, blue bars). This limitation suggests that morphological variance among individual microglia does not provide sufficient information for reliable differential diagnosis among BD, MDD, and CTRL samples. Interestingly, when votes from multiple cells belonging to the same subject were aggregated (subject-level voting), classification improved notably, reaching 80% accuracy, or higher, in four out of five folds, and 73.34% in the fifth one (Fig. 11a, green bars). This significant difference reveals the importance of considering intercellular context within subjects when characterizing disease signatures.

To further improve prediction, we tested whether grouping cell images into arrays $\:{\Phi\:}$ of fixed sizes (M = 3, 5, or 7), and varying the number of arrays generated for each subject (P = 100, 150, 200), improved diagnostic accuracy. This strategy boosted array-level accuracy significantly (Fig. 11b). With M = 3, average accuracy improved over the single-cell baseline, but remained modest (83.2%). However, increasing M to 5 yielded a considerable improvement (90.5%), while M = 7 resulted in the highest average accuracy (93.3%), although with slightly increased variance across folds, as is shown in Table S3, Supplementary Materials. This trade-off suggests that larger arrays may capture the morphological diversity induced by sEVs exposure of our cellular sensors better but may introduce variability in cases where individual subject samples are heterogeneous.

Notably, voting among arrays of the same subject further improved classification, achieving near-perfect results for M = 5, and M = 7 conditions (Fig. 11c). Therefore, in four of the five folds, all the subjects were classified correctly regardless of the array size, P. Only one subject (1160028) was repeatedly misclassified across configurations, suggesting either atypical microglial responses or a labeling error. Interestingly, although M = 7 achieved slightly higher overall accuracy than M = 5, it resulted in an error in fold 3, including a misclassification of the same subject 1,160,028 in fold 5 for P = 150 (Fig. 11c). These results suggest that M = 5 may represent an optimal balance between accuracy and variance. Together, these results indicate that disease-specific patterns of microglial morphology emerge more clearly when cells are analyzed in structured groups rather than in isolation. Subject-level classification further enhances the reliability of predictions, highlighting the value of contextualizing single-cell features within higher-order spatial or sampling structures. These findings support the development of diagnostic tools that incorporate hierarchical aggregation strategies to account for both intercellular and intersubject variability.

To provide a comprehensive evaluation of diagnostic classification performance beyond overall accuracy, Fig. 10 presents the aggregated confusion matrices at both the array level and the subject level for the best-performing configuration (DenseNet121, M = 5, P = 150), pooled across all five iterations of the repeated holdout validation. Additionally, Table 3 reports the corresponding per-class precision, recall, and F1-score derived from these aggregated confusion matrices. At the subject level, the aggregated confusion matrix shows that 44 out of 45 subjects were correctly classified, with only one subject (1160028) consistently misclassified across iterations.

Table 3 Per-class precision, recall, and F1-score for the best-performing configuration (DenseNet121, M = 5, P = 150), derived from the aggregated confusion matrices pooled across all five iterations of the repeated holdout validation. Results are reported at both the array level and the subject level (weighted voting). Macro and weighted averages are included for each level.

Full size table

To enable full reproducibility and transparency, Table 2 lists for each of the 45 subjects: (i) the number of iterations in which the subject appeared in the held-out test set, and (ii) the class label. The subject-level classification accuracy reported for each iteration corresponds to the fraction of test subjects correctly classified via weighted voting (Eq. 6) within the held-out partition of that iteration. The overall result of 44/45 individuals correctly classified was obtained by pooling the per-iteration outcomes: across the five iterations, each correctly classified subject was counted once, and the one subject that was misclassified (subject 1160028) was consistently misclassified across multiple iterations. The per-iteration subject-level accuracy values for the selected configuration (M = 5, P = 150) are reported, yielding a mean accuracy of 98.67% across the five iterations, enabling the reader to independently verify the aggregate result.

To complement these results under a standard stratified protocol, we additionally performed a stratified 5-fold cross-validation on the full cohort (n = 45), with 12 subjects per class for training and 3 per class for testing in each fold (36/9 overall). The results of this additional analysis are reported in Table 4 and are consistent with those obtained under the repeated random splits protocol, with 44/45 subjects correctly classified using the configuration M = 5 and P = 150.

Table 4 Per-fold classification results for the stratified 5-fold cross-validation (DenseNet121, M = 5, P = 150). Each fold used 12 subjects per class for training and 3 per class for testing (36/9 split).

Full size table

Array-level and subject-level (weighted voting) accuracies are reported, along with the misclassified subjects. Consistent with the repeated random splits protocol, 44 out of 45 subjects were correctly classified, with the same subject (1160028) being the only misclassified case.

Figure 11d presents the first-level accuracy across different values of M for each P configuration. The results indicate that configurations with M = 5 and M = 7 achieve higher accuracy compared to M = 3, regardless of the P dimension. However, increasing M also leads to a greater standard deviation. Among the highest performing configurations, M = 5 and P = 150 provide a good balance, yielding high accuracy while maintaining the lowest variability.

The results demonstrate that M = 5 and M = 7 yield superior performance compared to M = 3, regardless of the P dimension. The ANOVA test yielded p-values of 0.0021, 0.0003, and 0.0003 for P = 100, 150, and 200, respectively (p < 0.05 for all cases). This indicates that the differences in performance across the M dimension are statistically significant for each value of P. The Tukey post-hoc test was conducted to explore these differences further. For P = 100, the test revealed statistically significant differences between M = 3 and M = 5 (mean difference = 8.17, p = 0.0126) and between M = 3 and M = 7 (mean difference = 10.49, p = 0.0023). However, no significant difference was observed between M = 5 and M = 7 (mean difference = 2.32, p = 0.6057). Similarly, for P = 150, the Tukey test showed significant differences between M = 3 and M = 5 (mean difference = 7.54, p = 0.0038) and between M = 3 and M = 7 (mean difference = 10.21, p = 0.0003). Again, no statistically significant difference was observed between M = 5 and M = 7 (mean difference = 2.67, p = 0.3474). For P = 200, the test results were consistent with the other configurations, with statistically significant differences between M = 3 and M = 5 (mean difference = 7.50, p = 0.0053) and between M = 3 and M = 7 (mean difference = 10.80, p = 0.0003). As in previous cases, M = 5 and M = 7 did not differ significantly (mean difference = 3.30, p = 0.2352). While M = 5 and M = 7 are statistically equivalent, by choosing M = 5, the model achieved accuracies at the second level of 100% in four cross-validations.

Discussion

This study presents a novel image-based diagnostic framework that integrates CNNs with microglial cells acting as functional cellular sensors for patient-derived sEVs, constituting the first proof-of-concept for this diagnostic technology. Unlike traditional approaches that analyze patient samples directly, our method leverages the capacity of microglia to undergo disease-specific morphological changes in response to sEVs exposure, effectively translating molecular disease signals into measurable cellular phenotypes. This transduction step introduces a biologically meaningful amplification of subtle diagnostic cues, which are then captured through standardized fluorescence imaging, and interpreted by a fine-tuned DenseNet121 CNN model. Our results correctly classified 44/45 subjects across the five repeated subject-disjoint random splits. The best results were achieved with a method that organizes images of individual microglial cells into arrays of 5 × 5 images and uses voting among arrays that were created from the same subject. A key innovation lies in the hierarchical analysis pipeline, which groups isolated microglial cells into structured image arrays prior to classification. This approach captures the intercellular variability within subjects, enhancing diagnostic accuracy significantly over single-cell-based predictions. Furthermore, the use of confidence-weighted subject-level voting aggregates predictions across arrays, improving accuracy while reducing variance. Together, these strategies represent a conceptual advance in computational pathology by shifting from isolated cell analysis to context-aware, multi-cell inference, and establishes a new paradigm for applying AI to psychiatric diagnostics via immune-derived functional readouts.

Our method incorporates several methodological strategies to mitigate overfitting. The dataset partition was strictly subject-disjoint, ensuring that no subject appears in both training and testing within the same fold. All 45 subjects appear in the test partition at least once across the five iterations, providing a comprehensive assessment across the entire cohort. Cross-validation is the established standard practice for internal validation in neuroimaging and psychiatric deep learning studies, particularly when independent external datasets are not available^72,73. The DenseNet121 model was pre-trained on ImageNet and subsequently fine-tuned, rather than trained from scratch. Transfer learning has been extensively demonstrated to reduce overfitting risk on small medical imaging datasets by leveraging robust feature representations learned from large-scale natural image datasets^74,75. Our augmentation strategy goes beyond standard geometric transformations. The generation of multiple arrays (P ∈ {100, 150, 200})per subject through random cell permutations creates biologically meaningful data diversity. Since each array contains a different random combination and spatial arrangement of the same subject’s cells, the model is forced to learn generalizable morphological patterns rather than memorize specific cell configurations. Data augmentation is widely recognized as one of the most effective strategies for reducing overfitting in deep learning when working with limited medical imaging data⁷⁶. The weighted voting mechanism across multiple arrays introduces an ensemble-based aggregation that reduces the impact of individual misclassifications at the subject level. While near-perfect accuracy may initially suggest overfitting, several observations support the biological validity of our results. Only one subject (1160028) was consistently misclassified across configurations, which may reflect atypical microglial responses or a potential labeling ambiguity rather than model memorization. Furthermore, accuracy at the array level (without voting) was notably lower (∼90–93%), indicating that the model has not memorized training patterns but rather benefits from the aggregation of multiple, partially informative signals. This progressive improvement from microglial-level (∼55%) to array-level (∼93%) to subject-level (∼98%) accuracy demonstrates a structured signal amplification consistent with genuine biological signals rather than noise fitting.

The present study is based on a reduced sample size, which is associated with limitations that should be addressed in future studies, including the impact of sex, age, medication, other demographic variables, or childhood trauma on patient stratification.

There is currently no objective, scalable, and time-efficient method that can accurately classify mood disorders using peripheral biomarkers such as those found in blood samples. This highlights the urgent need for diagnostic approaches that are both biologically based, and applicable practically to large populations.

These findings reinforce the effectiveness of combining microglial cell-based biosensing with DL-based classification strategies, providing a robust and scalable solution for mood disorder classification. By using primary microglial cells as cellular sensors, the method detects changes in cell morphology caused by the content of plasma-derived sEVs. These alterations, imperceptible to human observers or traditional image analysis, are effectively decoded through DL techniques. The use of CNNs, particularly when applied to structured image arrays and combined with subject-level voting, significantly enhances the ability of the model to generalize across individuals and diagnostic categories. This dual-layered strategy, biological amplification of patient-specific signals through microglial response, and computational enhancement via array-based CNN analysis, represents a major advancement toward the goal of providing a non-invasive and high-throughput diagnostic tool. Furthermore, the minimal invasiveness of blood collection, and the speed of the image acquisition and analysis pipeline suggest potential for integration into clinical workflows. These elements establish a foundation for future precision psychiatry approaches, where DL can be used to complement and strengthen clinical decision-making in complex mood disorders such as BD, MDD, and potentially other mental disorders.

Regarding clinical scalability, the 24-hour sEV incubation period is comparable to the turnaround times of several routine clinical assays, including blood cultures (2–5 days), batch-run autoimmune/serology panels (1–3 days), and many molecular/genetic tests^77,78,79,80. Importantly, once the CNN model is trained, the computational classification step (inference) requires only seconds per subject. From a cost standpoint, the reagents and consumables for sEV isolation, microglial culture, and immunofluorescence staining are considerably lower than those required for neuroimaging modalities such as fMRI or PET. Moreover, the growing availability of automated high-content imaging platforms, such as Opera Phenix (Revvity/PerkinElmer) and ImageXpress Micro Confocal (Molecular Devices), provides feasible routes for further standardization and high-throughput scaling. Most importantly the 24-hour turnaround must be weighed against the current clinical reality: up to 69% of BD patients are initially misdiagnosed with MDD, and more than 30% may remain misdiagnosed for up to 10 years, with substantial downstream consequences, including inappropriate pharmacotherapy, treatment-emergent mania, and significant personal and economic burden. In this context, a next-day objective diagnostic readout, even if it requires overnight incubation, represents a substantial improvement over years of diagnostic uncertainty.

To provide insight into the features driving classification, we applied Grad-CAM + +⁸¹ to the final convolutional layer of the fine-tuned DenseNet121 model. The resulting class-discriminative heatmaps reveal that the CNN does not distribute attention uniformly across all cells in the array; instead, a subset of specific cells consistently concentrates the highest activation, indicating that these cells carry greater discriminative power for distinguishing the classes BD, MDD, and CTRL. Additionally, an equivariance analysis⁸² confirmed that the attention patterns of the model are robust to geometric transformations: When the arrays from a test partition are geometrically transformed, the same cells remain highly activated in most cases. Figure 12 illustrates this behavior for representative subjects from each diagnostic class. Panel (A) shows a BD subject (ID 3650017) under horizontal flip, panel (B) an MDD subject (ID 1160028) under double flip, and panel (C) a CTRL subject (ID 1012) under vertical flip. In all three cases, the Grad-CAM + + heatmaps between original and transformed arrays exhibit high spatial correlation (Pearson r > 0.90), demonstrating that the CNN attends to the same individual cells regardless of spatial arrangement and confirming that it learns genuine cellular features rather than large areas of the array. Notably, this consistency holds even at lower confidence levels, as observed for the MDD subject (60–72% confidence), suggesting that the learned representations are stable across the confidence spectrum. The Grad-CAM + + analysis suggests that the model attends to biologically meaningful cells. The observation that only a fraction of cells within each array drives classification also opens avenues for future work, including pre-selection strategies that prioritize morphologically informative cells to further improve classification performance.

Conclusion

Major Depressive Disorder and Bipolar Disorder are significant public health concerns, contributing extensively to global mental disability, and placing a substantial burden on healthcare systems. Traditional diagnostic methods rely on symptom-based assessments, often leading to misdiagnosis and delayed treatment. There is a high rate of misdiagnosis, ranging over 30%, for BD and MDD.

In this work, we developed a method based on DL to classify images from microglial cells stimulated with patient-derived sEVs, into three classes: BD, MDD, and CTRL. In such a way, microglial cells are used as cellular sensors which respond with morphological changes to the complex immune-regulatory content of sEVs. The best results were achieved with a method that organizes images of individual microglial cells into arrays of 5 × 5 images and uses voting among arrays that were created from the same subject. We also tested the accuracy without voting, and using individual cells with and without voting, obtaining lower results than those with arrays and voting. Our method shows better results using voting by subject for individual cell classification as well as for arrays of cell images. Our developed method, based on arrays of cell images and CNN models, yields excellent classification results for the three classes (BD, MDD, and CTRL). Results achieved 100% in 4 of the five partitions, and only one error in the fifth partition. Our results provide an alternative to the traditional diagnostic method, based on symptom assessment, that can result in a high rate of misdiagnosis. Therefore, it is a promising tool that could be further developed and tested.

Data availability

The datasets analyzed during the current study are not publicly available due to privacy reasons but are available from Universidad de los Andes (Co-author, uwyneken@uandes.cl) on reasonable request.

References

Kessler, R. C. & Bromet, E. J. The epidemiology of depression across cultures. Annu. Rev. Public. Health. 34, 119–138 (2013).
Article PubMed PubMed Central Google Scholar
Ferrari, A. Global, regional, and national burden of 12 mental disorders in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Psychiatry. 9, 137–150 (2022).
Article Google Scholar
Kennedy, S. H. Core symptoms of major depressive disorder: relevance to diagnosis and treatment. Dialogues Clin. Neurosci. 10, 271 (2008).
Article PubMed PubMed Central Google Scholar
Frey, B. N. et al. The Early Burden of Disability in Individuals With Mood and Other Common Mental Disorders in Ontario, Canada. JAMA Netw. Open. 3, e2020213–e2020213 (2020).
Article PubMed PubMed Central Google Scholar
Whiteford, H. A. et al. Global burden of disease attributable to mental and substance use disorders: Findings from the Global Burden of Disease Study 2010. Lancet 382, 1575–1586 (2013).
Article PubMed Google Scholar
Hirschfeld, R. M. A. & Vornik, L. A. Perceptions and Impact of Bipolar Disorder: How Far Have We Really Come? Results of the National Depressive and Manic-Depressive Association 2000 Survey of Individuals With Bipolar Disorder. J. Clin. Psychiatry. 64, 14089 (2003).
Article Google Scholar
Yang, R. et al. Differentiation between bipolar disorder and major depressive disorder in adolescents: from clinical to biological biomarkers. Front. Hum. Neurosci. 17, 1192544 (2023).
Article CAS PubMed PubMed Central Google Scholar
Frye, M. A. et al. Clinical risk factors and serotonin transporter gene variants associated with antidepressant-induced mania. J. Clin. Psychiatry. 76, 2941 (2015).
Article Google Scholar
Tondo, L., Lepri, B. & Baldessarini, R. J. Suicidal risks among 2826 Sardinian major affective disorder patients. Acta Psychiatr Scand. 116, 419–428 (2007).
Article CAS PubMed Google Scholar
Frye, M. A. et al. Use of health care services among persons who screen positive for bipolar disorder. Psychiatr Serv. 56, 1529–1533 (2005).
Article PubMed Google Scholar
Kessler, R. C. et al. Prevalence and effects of mood disorders on work performance in a nationally representative sample of U.S. workers. Am. J. Psychiatry. 163, 1561–1568 (2006).
Article PubMed PubMed Central Google Scholar
Oraki Kohshour, M., Papiol, S., Delalle, I., Rossner, M. J. & Schulze, T. G. Extracellular vesicle approach to major psychiatric disorders. Eur. Arch. Psychiatry Clin. Neurosci. 273, 1279 (2022).
Article PubMed PubMed Central Google Scholar
Colombo, M., Raposo, G. & Théry, C. Biogenesis, secretion, and intercellular interactions of exosomes and other extracellular vesicles. Annu. Rev. Cell. Dev. Biol. 30, 255–289 (2014).
Article CAS PubMed Google Scholar
Keerthikumar, S. et al. ExoCarta: A Web-Based Compendium of Exosomal Cargo. J. Mol. Biol. 428, 688–692 (2016).
Article CAS PubMed Google Scholar
Lee, I. et al. Small Extracellular Vesicles as a New Class of Medicines. Pharmaceutics 15, 325 (2023).
Article CAS PubMed PubMed Central Google Scholar
Daga, K. R. et al. Microglia morphological response to mesenchymal stromal cell extracellular vesicles demonstrates EV therapeutic potential for modulating neuroinflammation. J. Biol. Eng. 18, 1–21 (2024).
Article Google Scholar
Woodburn, S. C., Bollinger, J. L. & Wohleb, E. S. The semantics of microglia activation: neuroinflammation, homeostasis, and stress. Journal of Neuroinflammation 2021. 18(1), 1–16 (2021).
Google Scholar
Chaki, J. & Woźniak, M. Deep learning for neurodegenerative disorder (2016 to 2022): A systematic review. Biomed. Signal. Process. Control. 80, 104223 (2023).
Article Google Scholar
Tanriver, G., Soluk Tekkesin, M. & Ergen, O. Automated detection and classification of oral lesions using deep learning to detect oral potentially malignant disorders. Cancers (Basel). 13, 2766 (2021).
Article PubMed PubMed Central Google Scholar
Zhang, L., Wang, M., Liu, M. & Zhang, D. A. Survey on Deep Learning for Neuroimaging-Based Brain Disorder Analysis. Front. Neurosci. 14, 560709 (2020).
Article Google Scholar
Noor, M. B. T., Zenia, N. Z., Kaiser, M. S., Mamun, S. A. & Mahmud, M. Application of deep learning in detecting neurological disorders from magnetic resonance images: a survey on the detection of Alzheimer’s disease, Parkinson’s disease and schizophrenia. Brain Inf. 7, 1–21 (2020).
Article Google Scholar
Rawat, W. & Wang, Z. Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput. 29, 2352–2449 (2017).
Article ADS MathSciNet PubMed Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2261–2269 (2017).
Yu, Y. et al. Techniques and Challenges of Image Segmentation: A Review. Electron. 2023. 12, 1199 (2023).
Google Scholar
Vilar, D. R. & Perez, C. A. Extracting Structured Supervision from Captions for Weakly Supervised Semantic Segmentation. IEEE Access. 9, 65702–65720 (2021).
Article Google Scholar
Wang, M. & Deng, W. Deep face recognition: A survey. Neurocomputing 429, 215–244 (2021).
Article Google Scholar
Perez, J. P. & Perez, C. A. Face Patches Designed Through Neuroevolution for Face Recognition With Large Pose Variation. IEEE Access. 11, 72861–72873 (2023).
Article Google Scholar
Zambrano, J. E., Benalcazar, D. P., Perez, C. A. & Bowyer, K. W. Iris Recognition Using Low-Level CNN Layers Without Training and Single Matching. IEEE Access. 10, 41276–41286 (2022).
Article Google Scholar
Zambrano, J. E., Pilataxi, J. I., Perez, C. A. & Bowyer, K. W. Iris Recognition Using an Enhanced Pre-Trained Backbone Based on Anti-Aliased CNNs. IEEE Access. 12, 94570–94583 (2024).
Article Google Scholar
Kim, H. E. et al. Transfer learning for medical image classification: a literature review. BMC Medical Imaging. 22(1), 1–13 (2022).
Kumar, R., Kumbharkar, P., Vanam, S. & Sharma, S. Medical images classification using deep learning: a survey. Multimed Tools Appl. 83, 19683–19728 (2024).
Article Google Scholar
Jiang, H. et al. A review of deep learning-based multiple-lesion recognition from medical images: classification, detection and segmentation. Comput. Biol. Med. 157, 106726 (2023).
Article PubMed Google Scholar
Butoi, V. I. et al. UniverSeg: Universal Medical Image Segmentation. IEEE/CVF International Conference on Computer Vision (ICCV) 21381–21394 https://doi.org/10.1109/ICCV51070.2023.01960 (2023).
Conze, P. H., Andrade-Miranda, G., Singh, V. K., Jaouen, V. & Visvikis, D. Current and Emerging Trends in Medical Image Segmentation With Deep Learning. IEEE Trans. Radiat. Plasma Med. Sci. 7, 545–569 (2023).
Article Google Scholar
Qureshi, I. et al. Medical image segmentation using deep semantic-based methods: A review of techniques, applications and emerging trends. Inform. Fusion. 90, 316–352 (2023).
Article Google Scholar
Altin, F. G., Budak, İ. & Özcan, F. Predicting the amount of medical waste using kernel-based SVM and deep learning methods for a private hospital in Turkey. Sustain. Chem. Pharm. 33, 101060 (2023).
Article CAS Google Scholar
El Nahhas, O. S. M. et al. Regression-based Deep-Learning predicts molecular biomarkers from pathology slides. Nat. Commun. 15 (1), 1–13 (2024).
ADS Google Scholar
Kufel, J. et al. What Is Machine Learning, Artificial Neural Networks and Deep Learning?—Examples of Practical Applications in Medicine. Diagnostics 2023. 13, 2582 (2023).
Google Scholar
Liu, X., Wang, H., Li, Z. & Qin, L. Deep learning in ECG diagnosis: A review. Knowl. Based Syst. 227, 107187 (2021).
Article Google Scholar
Anari, S., Tataei Sarshar, N., Mahjoori, N., Dorosti, S. & Rezaie, A. Review of Deep Learning Approaches for Thyroid Cancer Diagnosis. Math. Probl. Eng. 2022, 5052435 (2022).
Article Google Scholar
Tran, K. A. et al. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med. 13, 1–17 (2021).
Article Google Scholar
Kim, S. H. K. et al. A Review of Machine Learning and Deep Learning Approaches on Mental Health Diagnosis. Healthc. 2023. 11, Page 285 (11), 285 (2023).
Google Scholar
Skaramagkas, V., Pentari, A., Kefalopoulou, Z. & Tsiknakis, M. Multi-Modal Deep Learning Diagnosis of Parkinson’s Disease - A Systematic Review. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 2399–2423 (2023).
Article PubMed Google Scholar
Rajpurkar, P. et al. Utilizing Deep Learning in Medical Image Analysis for Enhanced Diagnostic Accuracy and Patient Care: Challenges, Opportunities, and Ethical Implications. J. Deep Learn. Genomic Data Anal. 1, 1–17 (2021).
Google Scholar
Yamashita, R. et al. Toward Reduction in False-Positive Thyroid Nodule Biopsies with a Deep Learning–based Risk Stratification System Using US Cine-Clip Images. Radiol Artif. Intell 4, e210174 (2022).
Nilius, H. et al. A machine-learning model for reducing misdiagnosis in heparin-induced thrombocytopenia: A prospective, multicenter, observational study. EClinicalMedicine 55, e101745(2023).
Wei, W. et al. Single-cell multiomics analysis reveals cell/tissue-specific associations in bipolar disorder. Translational Psychiatry. 14, 1–12 (2024).
Article Google Scholar
Nagy, C. et al. Single-nucleus transcriptomics of the prefrontal cortex in major depressive disorder implicates oligodendrocyte precursor cells and excitatory neurons. Nat. Neurosci. 23, 771–781 (2020).
Article CAS PubMed Google Scholar
Belov, V. et al. Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures. Sci. Rep. 14 (1), 14–1084 (2024). (2024).
Article Google Scholar
Yang, J. et al. Regional neural functional efficiency across schizophrenia, bipolar disorder, and major depressive disorder: a transdiagnostic resting-state fMRI study. Psychol. Med. 54, 4083–4094 (2024).
Article Google Scholar
Zhao, Z., Shen, H., Hu, D. & Zhang, K. SE-1DCNN-LSTM: A Deep Learning Framework for EEG-Based Automatic Diagnosis of Major Depressive Disorder and Bipolar Disorder. Commun. Comput. Inform. Sci. 1692 CCIS, 60–72 (2023).
Article Google Scholar
Hata, M. et al. Accurate deep-learning model to differentiate dementia severity and diagnosis using a portable electroencephalography device. Sci. Rep. 15, 26304 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Anik, I. A., Kamal, A. H. M., Kabir, M. A., Uddin, S. & Moni, M. A. A Robust Deep-Learning Model to Detect Major Depressive Disorder Utilising EEG Signals. IEEE Trans. Artif. Intell. https://doi.org/10.1109/TAI.2024.3394792 (2024).
Article Google Scholar
Liu, W., Zhou, B., Li, G. & Luo, X. Enhanced diagnostics for generalized anxiety disorder: leveraging differential channel and functional connectivity features based on frontal EEG signals. Sci. Rep. 14, 22789 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Huang, X. et al. Depression recognition using voice-based pre-training model. Sci. Rep. 14, 12734 (2024).
Article ADS PubMed PubMed Central Google Scholar
Xu, Z. et al. Depression detection methods based on multimodal fusion of voice and text. Sci. Rep. 15, 21907 (2025).
Article ADS PubMed PubMed Central Google Scholar
Ding, Z. et al. Trade-offs between machine learning and deep learning for mental illness detection on social media. Sci. Rep. 15, 14497 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Automated Depression Detection From Text and Audio: A Systematic Review. IEEE J. Biomed. Health Inform 29,7498-7513 (2025).
Ricka, N., Pellegrin, G., Fompeyrine, D. A., Lahutte, B. & Geoffroy, P. A. Predictive biosignature of major depressive disorder derived from physiological measurements of outpatients using machine learning. Sci. Rep. 13, 6332 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Saad, H. S., Zaki, J. F. W. & Abdelsalam, M. M. A new framework for mental illnesses diagnosis using wearable devices aided by improved convolutional neural network. Sci. Rep. 15, 26788 (2025).
Article ADS PubMed PubMed Central Google Scholar
Wu, C. T. et al. Using Wearable Device and Machine Learning to Predict Mood Symptoms in Bipolar Disorder: Development and Usability Study. JMIR Med. Inf. 13, e66277 (2025).
Article Google Scholar
ShamsEldin, T. et al. Artificial intelligence for predicting depression anxiety and stress using psychometric data. Sci. Rep. 15, 37282 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Ramírez, G., Toro, R., Döbeli, H. & Von Bernhardi, R. Protection of rat primary hippocampal cultures from Aβ cytotoxicity by pro-inflammatory molecules is mediated by astrocytes. Neurobiol. Dis. 19, 243–254 (2005).
Article PubMed Google Scholar
Georgieva, M. et al. A refined rat primary neonatal microglial culture method that reduces time, cost and animal use. J. Neurosci. Methods. 304, 92–102 (2018).
Article PubMed Google Scholar
Yantén-Fuentes, L. et al. A novel brain-to-gut communication pathway mediated by astrocyte-derived small extracellular vesicles modulates stress-induced intestinal inflammation. Mol. Psychiatry. 2025, 1–12. https://doi.org/10.1038/s41380-025-03289-2 (2025).
Article CAS Google Scholar
Liu Jianzhuang, L. & Tian Yupeng. Wenqing &. Automatic thresholding of gray-level pictures using two-dimension Otsu method. Proceedings of the 5th International Conference on Circuits and Systems 2, 325–327 (2002).
Bengio, Y., Courville, A. & Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013).
Article ADS PubMed Google Scholar
Zhao, X. et al. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 57, 1–43 (2024).
Article CAS Google Scholar
Russakovsky, O. et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Article MathSciNet Google Scholar
Czajka, A., Bowyer, K. W., Krumdick, M. & Vidalmata, R. G. Recognition of Image-Orientation-Based Iris Spoofing. IEEE Trans. Inf. Forensics Secur. 12, 2184–2196 (2017).
Article Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016-December, 770–778 (2016).
Google Scholar
Bzdok, D. & Meyer-Lindenberg, A. Machine Learning for Precision Psychiatry: Opportunities and Challenges. Biol. Psychiatry Cogn. Neurosci. Neuroimaging. 3, 223–230 (2018).
PubMed Google Scholar
Quaak, M., van de Mortel, L., Thomas, R. M. & van Wingen, G. Deep learning applications for the classification of psychiatric disorders using neuroimaging data: Systematic review and meta-analysis. Neuroimage Clin 30,102584 (2021).
Ayana, G. et al. Multistage transfer learning for medical images. Artif. Intell. Rev. 57 (9), 57–232 (2024). (2024).
Article Google Scholar
Kim, H. E. et al. Transfer learning for medical image classification: a literature review. BMC Med. Imaging. 22, 69 (2022).
Article PubMed PubMed Central Google Scholar
Garcea, F., Serra, A., Lamberti, F. & Morra, L. Data augmentation for medical imaging: A systematic literature review. Comput. Biol. Med. 152, 106391 (2023).
Article PubMed Google Scholar
Rader, T. S., Stevens, M. P. & Bearman, G. Syndromic Multiplex Polymerase Chain Reaction (mPCR) Testing and Antimicrobial Stewardship: Current Practice and Future Directions. Curr. Infect. Dis. Rep. 23, 5 (2021).
Article PubMed PubMed Central Google Scholar
Desai, P. et al. Effects of Regulatory Support Services on Institutional Review Board Turnaround Times. J. Empir. Res. Hum. Res. Ethics. 12, 131–139 (2017).
Article PubMed PubMed Central Google Scholar
Hayashi, H. et al. Clinical impact of a cancer genomic profiling test using an in-house comprehensive targeted sequencing system. Cancer Sci. 111, 3926–3937 (2020).
Article CAS PubMed PubMed Central Google Scholar
López-Pineda, A. et al. Data mining of digitized health records in a resource-constrained setting reveals that timely immunophenotyping is associated with improved breast cancer outcomes. BMC Cancer. 18 (1), 18933– (2018). (2018).
Article Google Scholar
Chattopadhay, A., Sarkar, A., Howlader, P. & Balasubramanian, V. N. Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks. Proceedings – 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 839–847 (2018).
Lenc, K. & Vedaldi, A. Understanding Image Representations by Measuring Their Equivariance and Equivalence. Int. J. Comput. Vision. 127, 456–476 (2018).
Article MathSciNet Google Scholar

Download references

Funding

This work was supported in part by Agencia Nacional de Investigacion y Desarrollo (ANID) under Grant FONDECYT 1231675 (to CAP); in part by the Basal funding for Scientific and Technological Center of Excellence, under Project IMPACT #FB210024; in part by the Department of Electrical Engineering, Universidad de Chile; and in part by FONDEF ID24I10081 and ID19I10116 (to UW).

Author information

Jorge Zambrano and Alejandro Luarte contributed equally to this work.

Authors and Affiliations

Department of Electrical Engineering, Universidad de Chile, Santiago, Chile
Jorge Zambrano, Julian Contreras, Juan P. Perez & Claudio A. Perez
Center of Interventional Medicine for Precision and Advanced Cellular Therapy, IMPACT, Santiago, Chile
Jorge Zambrano, Alejandro Luarte, Julian Contreras, Juan P. Perez, Liliana Yantén-Fuentes, Pablo Lazcano, Ursula Wyneken & Claudio A. Perez
Laboratorio Neurociencias, Facultad de Medicina, Universidad de los Andes, Santiago, Chile
Alejandro Luarte, Liliana Yantén-Fuentes, Miguel Prieto, Pablo Lazcano & Ursula Wyneken
Clínica Universidad de los Andes, Santiago, Chile
Miguel Prieto

Authors

Jorge Zambrano
View author publications
Search author on:PubMed Google Scholar
Alejandro Luarte
View author publications
Search author on:PubMed Google Scholar
Julian Contreras
View author publications
Search author on:PubMed Google Scholar
Juan P. Perez
View author publications
Search author on:PubMed Google Scholar
Liliana Yantén-Fuentes
View author publications
Search author on:PubMed Google Scholar
Miguel Prieto
View author publications
Search author on:PubMed Google Scholar
Pablo Lazcano
View author publications
Search author on:PubMed Google Scholar
Ursula Wyneken
View author publications
Search author on:PubMed Google Scholar
Claudio A. Perez
View author publications
Search author on:PubMed Google Scholar

Contributions

J.Z., A.L., and J.C. performed research; J.Z, J.C. and J.P. Deep Learning method design and implementation; L.Y.F and M.P. performed patient data acquisition; A.L., L.Y.F., M.P., P.L. and U.W. cellular sensor data analysis; J.Z., J.C., J.P. and C.A.P. deep learning data analysis; J.Z., A.L., U.W. and C.A.P. wrote the paper; U.W. design cellular sensor research; C.A.P. design deep learning research. All authors approved the paper.

Corresponding author

Correspondence to Claudio A. Perez.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zambrano, J., Luarte, A., Contreras, J. et al. Deep learning-based differential diagnosis of major depression and bipolar disorder using microglia-cellular sensors and patient-derived small extracellular vesicles. Sci Rep 16, 11679 (2026). https://doi.org/10.1038/s41598-026-47476-9

Download citation

Received: 01 December 2025
Accepted: 31 March 2026
Published: 07 April 2026
Version of record: 08 April 2026
DOI: https://doi.org/10.1038/s41598-026-47476-9