Introduction

Optical Coherence Tomography (OCT) is a non-invasive imaging technique that uses reflected light to generate high-resolution, cross-sectional images of biological tissues. OCT is widely used in many medical fields, such as cardiology, dermatology, and gastroenterology, for imaging staging, and diagnosing multiple conditions1. It’s primary use has been in ophthalmology, for imaging the thickness and volume of retinal neural layers and other structures (e.g., macula, retinal nerve fiber layers (RNFL), ganglion cells, which together with inner and outer nuclear and plexiform layers create the ganglion cells complex (GCC)2, photoreceptor layer and retinal pigment epithelium3). OCT images of the retina are considered to be as high-resolution as histological images. In standard ophthalmological evaluation, clinicians look mainly for structural abnormalities, but when OCT is used in research involving between-group comparisons, the focus is on quantitative measurement of selected retinal layers thickness and volume4.

One such group includes patients with severe psychiatric illnesses. Mental health disorders are a growing challenge for health care and science5. At the same time, psychiatry seems to be the last field of medicine in which diagnostic procedures are still not based on objective biological markers, but on subjective clinical assessment, mainly based on behavioral and self-report data6. Schizophrenia is a mental illness whose basic symptoms have been characterized for over 100 years, but for which no clear biomarkers have yet been developed, e.g. based on neuroimaging outcomes7. Patients with schizophrenia express many psychopathological symptoms, such as auditory and visual hallucinations, paranoid thoughts, cognitive dysfunctions, and substantial problems with social functioning. Schizophrenia is a long-lasting disease with phases of exacerbation and improvement, in which the so-called negative or deficit symptoms, such as lack of initiative, interpersonal and communicational withdrawal, and blunted affect dominate, even when pharmacological treatment is provided8. Although it is generally agreed that schizophrenia is a neurobiological disease9, probably due to its high level of its complexity and heterogeneity, its major neural mechanism is still not unequivocally known .

Since the retina ontologically develops from the neural tube and might be treated as an observable part of the brain, it is assumed that its morphology is sensitive to pathological changes typical of neuropsychiatric disorders . Indeed, much evidence, in multiple neurodegenerative disorders10, and in psychiatry11, indicate that specific retinal variables can serve as disease-related biomarkers12. Previous research confirms the existence of thinner retinal layers in schizophrenia, especially macular thickness, and to a lesser extent retinal nerve fiber layer (RNFL), but many questions still remain about when in the course of illness the retinal changes can be observed, how/whether these progress over time, and which clinical features and outcomes they are most strongly related to13,14,15. In addition, many previous analyzes did not control for the influence of confounding factors such as somatic health14 (e.g., diabetes, smoking, hypertension, other cardiovascular diseases) that can damage the retina and significantly affect the results of OCT assessment16.

The expectation of significant alterations in retinal morphology in schizophrenia has many grounds. First, retinal thinning has long been observed in neurodegenerative diseases due to atrophic changes throughout the central nervous system17. Schizophrenia as a progressive disease was defined as dementia preaecox by Kraepelin already in the early stages of psychiatry development18. In addition to developmental abnormalities, contemporary neuroimaging data also confirm neuroprogressive changes in schizophrenia patients’ brain structure19. Data from cognitive studies have also shown that these patients have disturbances in auditory and, olfactory, but also visual perception, and therefore, the neurobiological basis of these dysfunctions has been sought20. In recent years, retinal alterations have been even more closely linked with the neuronal substrate of schizophrenia and its specific course. Sheehan et al.21, referring to earlier observations by Johnson and Cowey22, suggested that thinning of the retina’s neuronal layers in schizophrenia may result from well-established thalamic abnormalities. Thalamic dysfunctions are a key feature of the cognitive dysmetria hypothesis23 regarding the neuroaetiology of schizophrenia, which emphasize the role of thalamo-centric brain network dysfunctions in this disorder. It has been suggested that, as a result of retrograde transsynaptic degeneration , morphological abnormalities of the thalamus lead to loss of ganglion cell axons projecting from the retina to the thalamus (e.g., the RNFL fibers that constitute the optic nerve), which eventually leads to loss of ganglion cell bodies and the other retinal neurons that project to them (e.g., bipolar cells). It should also be pointed out that individuals with schizophrenia exhibit altered aging trajectories regarding the central nervous system leading to accelerated aging of the neuronal structures, which may also involve the retina, especially considering its high metabolic demands24,25.

So far, retinal measures have been used only a few times to verify computational methods for the automatic differentiation of schizophrenia patients from healthy controls. Although some of the results report progress in this matter26, sometimes it was not possible to achieve discrimination accuracy better than 70\(\%\). Binary classification of schizophrenia spectrum disorder (SSD) patients and healthy controls was also presented in27, where authors used classical classification models, especially logistic regression. The latest Indian research28, which included probably the largest group of patients so far, seems to be promising considering the results of analyzes using a trained convolution neural network (CNN) deep learning algorithm, but this study evaluated retinal vascular abnormalities based on fundus camera images, not OCT data.

Currently, deep neural networks are considered the best classifiers. Therefore, in this paper, we develop this approach to classifying schizophrenia patients from controls based on OCT data. Several 1D CNN-based models were used in the first stage of the calculations . If the quality of the classification (accuracy of the method) is not satisfactory, it is advisable to aggregate classification results. A certain, although minor, disadvantage is the need to use at least two classifiers. On the other hand, this approach usually results in an increase in classification accuracy.

Aggregation techniques have been successfully developed by researchers recently, in particular in the field of fuzzy logic and fuzzy sets. Aggregation methods include various types of averages, t-norms or s-norms29,30,31,32,33, but also more advanced operators such as the Choquet integral34,35, Ordered Weighted Averaging Operators36,37, pre-aggregation functions38,39,40,41,42, generalizations and expansions of the Choquet integral43, modifications of classical operators such as Bonferroni mean44, intuitionistic operators, e.g.,45,46, conditional aggregation operators47, granular models48, order-2 fuzzy sets49. Also, current approaches to aggregating classification results from different methods include procedures to maximize aggregation security (see50), and incorporate fuzzy inference systems51 and geometrical approaches52. Therefore, we include these methods in the current study.

Particularly in medical applications such as disease detection, the ultimate effectiveness of the classification process is more important than the effectiveness of any single method . Because of this, combining deep neural networks and the results they generate using effective aggregation operators has the potential to be an effective solution The aim of this paper was to determine if it is possible to differentiate SSD patients from healthy controls using deep learning models based on OCT data only. One of the main goals of the article is to check how methods based on the aggregation of classification results work in classifying people as having schizophrenia vs healthy donors . Over 300, 000 different variants of aggregation operators have been tested. Unfortunately, only a few versions provided satisfactory results that can be confidently said to improve the classification results significantly. Nevertheless, the results clearly show that new generalizations and extensions of the Choquet integral and its earlier extensions (so-called pre-aggregation functions) can successfully aid in disease classification. A novel aspect of this study is that classical and deep learning classifiers, but also using different variants of aggregation operators to improve the classification results , are applied to OCT data in schizophrenia for the first time.

An outline of the remainder of the paper is as follows. “Methods” describes the research methods including subject ascertainment and methods for obtaining OCT data . In “Classification on a basis of deep neural networks”, individual classifiers (deep neural networks) are discussed. The general aggregation scheme is presented in “Classification based on an aggregation procedure”. “Results” covers numerical experiments and a description of the obtained results. In “Discussion”, we present conclusions and potential directions for future research work.

Methods

Participants

Patients diagnosed with schizophrenia according to ICD10 classification (with code F20.x), hospitalized in the Medical University Psychiatric Clinic in Poland were enrolled in the clinical group. 17.0% of patients’ cases were hospitalized due to the first psychotic episode. The chlorpromazine equivalent was 452.22 mg/day (SD = 128.2), 57% of patients received atypical antipsychotic drugs only, and the remaining patients received a combination of classic and atypical medications. The mean age of patients was 39.52 (SD = 15.38), 55% of patients were male, and the mean duration of illness was 16.83 (SD = 13.72) years. Years of education among patients was 13.05 (SD = 2.37) years. Patients did not differ significantly from controls on age, sex, years of education, or BMI (all p > 0.05). However, a significantly higher percentage of patients were unemployed and unmarried (both p < 0.001), and 43.14% of patients consumed various forms of tobacco products compared to 15.97% from the control group (p = 0.001). Psychiatric diagnosis was established by a certified specialist in the field of psychiatry, typically, the attending psychiatrist. Psychopathology was assessed using the original three-factor scoring solution for the Positive and Negative Syndome Scale (PANSS)53. Mean sydrome scores were: positive symptoms = 18.94 (SD = 6.41), negative symptoms = 24.11 (SD = 7.25), and general symptoms = 46.58 (SD = 9.17). The total PANSS score was 89.63 (SD = 19.45).

Considering the necessity to maintain a high level of OCT measurement validity,we used the following exclusion criteria for patients and controls: previously diagnosed ophthalmological disease (e.g. glaucoma, macular degeneration, diabetic retinopathy), diabetes, untreated arterial hypertension (or, if treated, blood pressure above 140/90 mmHg during the examination), history of trauma to the eye area, ophthalmologic surgery, head injuries with neurological consequences, eye refraction over \(\pm 5\) diopters, glaucoma risk (DDLS scale \(\ge 6\)), history of psychoactive substances addiction (except nicotine), inherited intellectual disability, dementia. The presence of a relevant comorbid psychiatric disorder was also an exclusion criterion for patients, and the presence of any psychiatric disorder was an exclusion criterion for controls. OCT evaluation was conducted during the phase of clinical improvement to maximize patient cooperation . After completing the clinical group, a healthy control sample was collected using the pairwise sampling method, taking into account basic demographic characteristics. This study was approved by the Bioethics Committee at the Medical University (consent nr KE-0253/248/2020 issued on 26 November 2020,) and all participants provided informed consent according to the Declaration of Helsinki. Initial recruitment of study participants began at the beginning of 2021, and the study lasted until the end of 2022.

OCT assessment

OCT data were acquired using an OPTOPOL COPERNICUS REVO®SD-OCT device54. This has a scanning speed of up to 80,000 scans/s, an axial resolution of 2.6 \(\mu\) m, a lateral resolution of 12 \(\mu\) m, and a scanning depth of 2.4 mm. The device uses a superluminescent diode (SLED) with a light wavelength of 830 nm as a signal source. The measurements were generated using OPTOPOL SOCT version 11.0.7. The software enables automatic segmentation of retinal layers, and includes adjustments for potential artifacts . In addition to automatic adjustments that are made during imaging, the quality of each image is automatically rated on a scale of 0-10. Only high-quality images were included in the study (Score \(\ge\) 7). Each patient had 4 photos taken (2 for each eye). The macular image was taken using a protocol of 640 A-scans and 85 B-scans on a 7x7mm square area centered on the macula. The image of the optic disc was taken using a protocol of 512 A-scans and 112 B-scans on a 6x6 mm area centered on the optic disc. Spot measurements were divided into quadrants according to the EDTRS grid55.

Data processing

The processing of the dataset was divided into several steps. All steps were performed using custom programs created in Python using Anaconda distribution and Anaconda Navigator graphical interface. Keras, Scikit-Learn and TensorFlow libraries were the most important libraries applied for this study.

Data processing steps included preprocessing steps such as feature extraction, feature normalization and feature selection. The main processing was related to training and testing classification models as well as applying aggregation of classifiers.

The entire original dataset covered 122 observations, from which 2 observations were discarded due to serious missing data fragments. Among the remaining 120 observations 61 belonged to the control healthy group and 59 were for the patient group.

Observations were defined in terms of a class label and a set of features. The class label defined the observations as belonging to one of the two groups: patients or healthy control. The original set of features consisted of more than 140 features. They were divided into several categories such as demographic and education category, neurological evaluation and symptoms category, hospital treatment, and psychotic symptoms category defined for patient group only. The most important category applied in this research, however, is OCT metrics-related category covering 80 features measured for all participants. The OCT features included macular thickness (MT), retinal macular volume (MV), combined outer nuclear layer and outer plexiform layer (ONL-OPL), macular and peripapillary RNFL thickness and GCC. Measurements for each eye were treated as separate features.

The choice of these OCT measures, subsequently used as features included in the computational experiments, was based on conclusions from previous meta-analyses and reviews of studies on retinal morphology in schizophrenia. For example, according to Prasannakumar et al.56, RNFL structure was OCT variable that most strongly differentiated between psychotic disorder vs. healthy controls status. Shew et al.’s review57 concluded that patients with schizophrenia are characterized by a significantly thinner global peripapillary RNFL layer, thinner average macular layer, and macular ganglion cell-inner plexiform sublayer. Analogous findings regarding MT, pRNFL, and GCL were documented earlier by Komatsu et al.15,58 and Gonzalez-Diaz with co-workers59. Additionally, meta-analysis of Lizano et al.60 and Asanad et al.27 also reached the same concussions: macular thickness, macular and peripapillary RNFL followed by ganglion cell complex are most thinned in schizophrenia patients.

Before the classification, the standard preprocessing procedures of outlier detection and data normalization were performed . The next step in the preprocessing pipeline was feature selection. This procedure was performed using Logistic Regression, a classical classification model that gives easily interpretable results.

Features importance in the form of weight vectors were derived, and the features with the lowest weights, (i.e., 0.01) were removed from the dataset. The feature selection procedure was performed using 5-fold cross-validation on a training dataset and tested on a testing dataset and the whole procedure including train-test split was repeated 1000 times and averaged to achieve stable results. This step enabled the reduction of the dimensionality of the dataset to 20 features. This feature selection procedure enables a balance between the number of features and the number of observations, to avoid the curse of dimensionality problem. What is more, the smaller dimensionality of the dataset helps to avoid overfitting and enable more efficient computations. The number of remaining features (20) represented an optimal tradeoff between the dimensionality of the dataset and the flexibility of the model. To verify this, tests using different number of features were also performed. Results showed that classification performed with more than 20 features did not improve classification results. .

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Classification on a basis of deep neural networks

The classification task aimed to differentiate between two classes: Schizophrenia patients and healthy control. Classification was performed on a dataset containing 120 observations and 20 OCT-related, numerical features. The dataset was shuffled in random order and split into training and testing sets in the proportion of 80:20. Data from a single participant were used only one time, in the train or test dataset, in order to provide compliance with a subject-independent approach. This approach ensures that the data of participants from the test dataset are unknown for the model. 20% of the input dataset used for testing corresponds to 24 participants. The procedure of shuffling, splitting, and classifying was repeated 1000 times to obtain stable, non-random results.

Before applying CNN models classical supervised machine learning classification models were examined in order to obtain the reference results. Several models were examined including: Support Vector Machine (SVM) with the radial, linear, and polynomial kernel, Multi-layer Perceptron (MLP), K-Nearest Neighbor(KNN), and Logistic Regression. Before the classification, a hyperparameter grid searching procedure was applied to adjust the hyperparameters of classifiers. The procedure of hyperparameter searching, similarly to what was done during feature selection, was done using 5-fold cross validation performed on a training dataset and tested on a testing dataset and the whole procedure including train-test split was repeated 1000 times and averaged. Used hyperparameter values are presented in Table 1 and classification accuracies obtained with these classifiers are presented in Table 2.

Table 1 Hyperparameters of classical classifiers.
Table 2 Results of classical classifiers.

Several classification models based on 1D CNN were applied. These types of models were chosen due to the numerical nature of the dataset. Among analysed models were models well-known in the literature, such as different versions of ResNet, SeResNet, Net1D, ACNN, CRNN, DenseNet, VGG and EfficientNet. Calculations were applied using 1D variants of 1D Zoo using Keras and TF.Keras libraries61. Weights were obtained by converting ImageNet weights from corresponding 2D models. The default pooling/stride size for 1D models was set to 4 to match (2, 2) pooling for 2D nets. Parameter settings and weights were adjusted in the process of model tuning. The models chosen for this have a a relatively small number of layers. These uncomplicated models were chosen due to the relatively small sample size available for this deep learning modeling. This choice was also confirmed by the initial preliminary tests. Beyond mentioned architectures, two simple CNN, marked as Model no. 1 and Model no. 2, were also applied. Characteristic of all applied models is presented in Table 3. What is more, details of Model no. 1 and Model no. 2 are presented in Table 4 and Table 5. An Adam optimizer was applied and accuracy was used as a metric. Categorical crossentropy was used as a loss function.

Table 3 Results of classical classifiers.
Table 4 Summary of 1D CNN model no. 1.
Table 5 Summary of 1D CNN model no. 2.

The mentioned CNN-based models were tested in the first part of the classification tests. The test sample was balanced, so accuracy was chosen as the classification quality metric. Table 6 shows the mean values of accuracy obtained for the most effective classifiers achieved for the dataset.

Table 6 Accuracy obtained with individual classifiers.

Classification based on an aggregation procedure

This discussion of aggregation techniques will begin with a focus on the Choquet integral approach. Imagine we have several classification algorithms trying to predict an outcome, such as determining whether a person is sick or healthy. Each of these algorithms, known as classifiers, provides its “opinion” in the form of probabilities, but not all opinions are equally important. This is where use of the Choquet integral becomes beneficial: it allows one to intelligently combine these opinions, taking into account their individual importance and how well they work together.

As an example, take the case of having three classifiers: the first analyzes lab results, the second focuses on demographic data, and the third looks at symptoms described by the patient. Each gives its probability estimate, for example: \(80\%\) (sick), \(60\%\) (sick), and \(50\%\) (healthy). A traditional method might simply average these values, yielding a result of \(63.3\%\). However, such an average ignores the fact that the classifiers differ in their precision and reliability, and it also disregards the pairwise agreement between the three of them.

The Choquet integral works differently because it considers not only the individual weight of each classifier but also how well their results align. For instance, if the two classifiers analyzing lab and demographic data indicate a similar outcome, their agreement will be “amplified,” increasing their influence on the final result. Meanwhile, the third classifier, which gives a different result, will have less impact, especially if its predictions are less reliable. As a result, instead of a simple average of \(63.3\%\), the Choquet integral might yield a result of \(75\%\), which better reflects the significance and agreement of the individual classifiers.

In this manner, the Choquet integral is more than just a simple averaging method - it reflects an intelligent system that better utilizes the “votes” of individual classifiers by considering both their individual importance and their collaboration. It is like a team of experts working together, each specializing in a different area, where the final decision is more accurate because it considers not only individual opinions but also how well the experts agree. This ensures that the resulting outcome is more precise and trustworthy. Moreover, the interpretability of the Choquet integral has been treated in depth in the literature, see, for instance62,63,64. Generalizations or extensions of Choquet integral have also been described38,39,40,41,42,43.

Our approach involved the following sequence. First, a data record describing the patient is used as an input to several different deep neural networks that return the probability values of the selected record belonging to a specific class. These probabilities are aggregated into a single result. To obtain it, the importance (weights) of individual classifiers are also taken into account. These may include, among others, accuracy values obtained in a series of pretests. The final value is yielded with the help of an aggregation operator such as Choquet integral, OWA operator and many others. The general scheme of the procedure is presented in Fig. 1.

Figure 1
Figure 1The alternative text for this image may have been generated using AI.
Full size image

General overall scheme of the aggregation procedure.

Figure 2
Figure 2The alternative text for this image may have been generated using AI.
Full size image

Accuracy of the Quadrature-Inspired Smooth Generalized Choquet Integral in dependence on the \(\alpha\) parameter value.

Figure 3
Figure 3The alternative text for this image may have been generated using AI.
Full size image

Accuracy of the classical pre-aggregation function in dependence on the \(\alpha\) parameter value.

Figure 4
Figure 4The alternative text for this image may have been generated using AI.
Full size image

Standard deviations in dependence on the \(\alpha\) parameter values.

As an example, one of the operators65 is a generalized aggregation function of the form

$$\begin{aligned} Ch_{Smooth}= \sum \limits _{i=1}^n\left( M\left( C\left( h_i\right) ,g\left( A_i\right) \right) -M\left( C\left( h_{i+1}\right) ,g\left( A_i\right) \right) +M \left( C\left( h_i\right) -C\left( h_{i+1}\right) ,g\left( A_i\right) \right) \right) \end{aligned}$$
(1)

where , n is the number of classifiers, M is any t-norm, C is a quadrature-inspired function, for instance, of a Weddle form

$$\begin{aligned} C_{W}\left( h_j\right) = \frac{1}{20}\left( h_{j-3} + 5h_{j-2} + h_{j-1} + 6h_{j} + h_{j+1} + 5h_{j+2} + h_{j+3}\right) \end{aligned}$$
(2)

\(h_j, j=1,\ldots ,n\) are the values of belongingness to a specific class obtained with the n classifiers sorted in a non-decreasing manner (with an assumption that \(h_{0}=h_{-1}=\ldots = h_1\) and \(h_{n+1}=h_{n+2}=\ldots =0\)), and \(g\left( A_i\right)\) are some \(\lambda\)-fuzzy measure values, built recursively on a basis of the importance of individual classifiers.

Results

Below, we describe the results obtained by the proposed method and compare them with various results obtained by individual classifiers. They are presented as averages obtained from one hundred iterations of the experiment for various input data. We have tested over 300, 000 versions of classifiers. They are, among others, OWA, Choquet integral, generalized Choquet integrals (so-called pre-aggregation functions), and classical aggregation functions such as averages, weighted averages, or t-norms. It is worth emphasizing that we have implemented a substantial number of aggregation functions and we do not discuss the findings for all of them. Rather, we emphasize those functions and parameters that led to the highest levels of classification accuracy. Finally, it is worth stressing that more detailed results can be made available to interested readers upon request.

The results presented below were obtained using the same 80:20 data partition (training:testing) method as used during the testing of individual deep learning-based classifiers. It is worth noting that aggregation in this study is a process applied to the classification results obtained by individual classifiers, but these classifiers are not trained collectively for the purpose of aggregation; instead, they are trained separately to achieve the best possible results individually. For instance, the authors of a work on pre-aggregation functions66 do not mention a separate validation set. All validation and testing are conducted within the framework of 10-fold cross-validation, which combines the evaluation of the model on training and testing sets. Therefore, it can be stated with certainty that an aggregation operator is good when its application results in a classification accuracy measure that exceeds the accuracy of the best individual classifier. We refer the reader to the classic work explaining the application of Choquet-type operators in aggregation and classification35. Finally, it is worth noting that there are many aggregation or aggregation-like operators used in practice. They can be, for instance, typical average functions such as arithmetic mean. These functions are not trained. Typically, they are just used to aggregate the results not as a part of the trained model. However, they can be.

The best accuracy was obtained by Quadrature-Inspired Smooth Choquet Integral given by formula (1). However, the function presented after the integral sign is of the Newton-Cotes 9-point quadrature form

$$\begin{aligned} Q_{9}\left( h_j\right) = \frac{1}{28350}\left( 989h_{j-4} + 5888h_{j-3} - 928h_{j-2} + 10496h_{j-1} - 4540h_{j} + 10496h_{j+1} - 928h_{j+2} + 5888h_{j+3} + 989h_{j+4}\right) \end{aligned}$$
(3)

where \(h_j\) are defined in the previous section. Moreover, the t-norm used in the formula is

$$\begin{aligned} T\left( a,b\right) = \left\{ \begin{array}{l}\left( 1-\left( 1-a^\alpha \right) \sqrt{1-\left( 1-b^\alpha \right) ^2}\right. \\ - \left. \left( 1-b^\alpha \right) \sqrt{1-\left( 1-a^\alpha \right) ^2}\right) ^\frac{1}{\alpha } \\ \text {if} \quad \left( 1-a^\alpha \right) ^1+\left( 1-b^\alpha \right) ^1\le 1 \\ 0 \quad \text {otherwise}\end{array}\right. , \alpha >0 \end{aligned}$$
(4)

and \(\alpha =5.2\).

An interesting finding is that classical pre-aggregation functions of the general form

$$\begin{aligned} Ch_{M}= \sum \limits _{i=1}^n\left( M\left( h\left( x_i\right) -h\left( x_{i+1} \right) \right) ,g\left( A_i\right) \right) \end{aligned}$$
(5)

where M is a t-norm or overlap function, returned an accuracy that was over 2 percentage points lower than the QISCI. The t-norm for which the result was found at this level is (4) for \(\alpha =1.6\). Also, the overlap function given by

$$\begin{aligned} O_v\left( a,b\right) = \left\{ \begin{array}{l} \frac{ab}{\left( p + \left( 1 - p\right) \left( a + b - ab\right) \right) }\\ \quad \quad \text {for } \frac{ab}{\left( p + \left( 1 - p\right) \left( a + b - ab\right) \right) }<\alpha (*)\\ \alpha \\ \quad \quad \text {if not (*) and} \frac{ab}{\left( p + \left( 1 - p\right) \left( a + b - ab\right) \right) }<\beta \\ \alpha + \frac{ab}{\left( p + \left( 1 - p\right) \left( a + b - ab\right) \right) \left( 1-\beta \right) } \\ \quad \quad \text {otherwise} \end{array}\right. \end{aligned}$$
(6)

produces good results in a combination with \(Ch_M\) pre-aggregation operator.

The next result was obtained by classical Choquet integral

$$\begin{aligned} Ch = \sum \limits _{i=1}^n\left( \left( h\left( x_i\right) -h\left( x_{i+1}\right) \right) g\left( A_i\right) \right) \end{aligned}$$
(7)

Finally, the other non-Choquet-like functions giving acceptable results (see Table 7) were

$$\begin{aligned} B\left( a_1,\ldots ,a_n\right) =\left( \frac{1}{n\left( n-1\right) }\sum \limits _{i,j=1,i\ne j}^{n}a_i^pa_j^q\right) ^{1/\left( p+q\right) } \end{aligned}$$
(8)

i.e., Bonferroni mean67 for \(p=0.01\) and \(q=0.01\), and Ordered Weighted Averaging Tangent

$$\begin{aligned} OWAT = \frac{2}{\pi }\arctan \sum \limits _i^n\tan \left( \frac{\pi }{2}a_i\right) \end{aligned}$$
(9)

where

$$\begin{aligned} w_i = \frac{2\left( i+1\right) }{n\left( 1\right) } \end{aligned}$$
(10)

It is worth noting that the results using the functions described above significantly outperformed the individual classifiers discussed in Table 6. All the results obtained on the basis of aggregation operators are gathered in Table 7.

Table 7 Average percentage accuracies obtained with the best operators within particular classes.

Figure 2 shows the dependency of the accuracy value of the function (1) on the value of parameter \(\alpha\) appearing in (4). The best choice of the parameter \(\alpha\) is near the value 5.2. However, smaller values still lead to accurate results. Figure 3 demonstrates a similar dependency, but for another formula (5) with the same t-norm. One can observe that only with values close to the maximal \(\alpha\) can satisfactory results be obtained.

Figure 4 depicts the standard deviation values as a function of the value of the \(\alpha\) parameter. It is interesting that in the neighborhood of the winning \(\alpha\) values for the two methods, the standard deviations are relatively small.

Figure 5 depicts AUC values that show that the Quadrature-Inspired Generalized Choquet Integral method is the best aggregation operator for detecting patients with symptoms of schizophrenia and largely corresponds to the results presented in Table 7. Similarly, when analyzing fragments of the ROC curve, i.e. pROC plot, it can be seen that the values closest to the key point in the upper left corner of the graph (i.e., low FP rate and high TP rate) are the QIGCI values.

Figure 5
Figure 5The alternative text for this image may have been generated using AI.
Full size image

ROC curves with the AUC values for the aggregation operators denoted by the formulae numbers.

Discussion

In this paper, we presented a protocol for detecting schizophrenia based on retinal imaging (OCT) data that combined two types of classification methods: deep neural networks and operators aggregating classification results output by these networks. Individual deep neural networks have proven to be effective in classifying individuals with schizophrenia based on retinal layer thickness measures. We found that higher levels of classification accuracy could be obtained by combining the two methods. The combination of deep neural network outputs and the Quadrature-Inspired Smooth Generalized Choquet Integral was particularly effective at performing accurate classification. We compared over 300, 000 aggregation methods. The combination of the results of individual classifiers (deep neural networks) and aggregation methods based on fuzzy techniques gives very good results.

Using the latest, but also classic, aggregation operators, one can easily combine the results of individual classifiers and obtain better accuracy. The method described by formula (1) was associated with almost \(88\%\) accuracy, beating individual classifiers by over 10 percentage points. This shows that with a relatively small amount of data, high levels of classification accuracy can be obtained by combining classifiers and building solutions that allow for the fusion of information (at the data level or at the results level).

It is worth noting that each of the aggregation functions has slightly different characteristics and so the final results can vary slightly depending on the application area, understood as the parameters outcoming from the experimental settings. Therefore, one challenge for future research is to streamline the process of identifying the aggregation operators that are most likely to achieve optimal classification.

Measures of the macular thickness (MT) were the features with the highest weights in the classification models. This outcome is consistent with the results of recent meta-analyses and reviews of OCT data in schizophrenia58. According to Sheehan et al.21, for example, macular volume and thickness values differentiate schizophrenia patients from other groups better than peripapillary (e.g., RNFL) measures, because macular features are more susceptible to neurodegenerative processes68. The OCT scans covering the macular volume includes a greater proportion of ganglion cell bodies and ganglion cell axons and dendrites compared with the region adjacent to the optic disc. The higher level of feature importance regarding macular measures in our groups is also consistent with data showing accelerated aging effects in schizophrenia’ OCT findings24,25. This suggests that computational models using retinal imaging data could generate indices of the retinal age gap (i.e., the difference between chronological age and predicted retinal age relative to a normative sample), similar to recent brain aging modeling in schizophrenia69.

There are some limitations of the study to be addressed. The experimental group was relatively small taking into account the sample size desired in analyses using deep neural networks methods. Using a larger sample would allow for increased reliability in feature extraction and hyperparameter values, and for a separate validation dataset.

Additionally, the clinical group comprised only inpatients diagnosed with schizophrenia, which could introduce confounding factors. This would occur to the extent that the observed classification results were due to variables related to acute exacerbation of symptoms (e.g., increases in anxiety, changes in medication dose, suicidality), as opposed to a diagnosis if schizophrenia per se70. However, according to the recent meta-analysis by Shew et al.57 there is no evidence that OCT outcomes are significantly different in acute and more stable phases of schizophrenia. Another potentially confounding factor was a significantly higher representation of smokers in the schizofrenia group. It is well documented that smoking worsens the retina morphology71,72, however some meta-analyses do not corroborate such influence73. Nevertheless, future studies focused on automatic classification of schizophrenia using CNS biomarkers should at the very least balance the percentage of smokers in the formed groups.

The key difference between schizophrenia patients and the control group in our study is the use of psychotropic drugs, primarily antipsychotics, by the former. Previous findings regarding the potential impact of antipsychotics on OCT imaging data are equivocal, but trend towards indicating neurodegenerative effects14,74,75. These data are consistent with those indicating that long-term neuroleptic use is related to a slight, but noticeable brain tissue volume reduction76,77. Despite these suggestions, the debate about whether antipsychotics have a dominant atrophic or neuroprotective effect is still ongoing78,79. Antipsychotic neuroleptics are primarily antagonists, i.e. they have a blocking effect on brain areas with dopaminergic receptors80. Studies confirm the important role of dopaminergic neurons in the functioning of the retina, which additionally suggests that the previously unspecified range of retinal changes typical of schizophrenia may be related to the effects of dopaminergic antagonists81. However, it should also be pointed out that the mere fact of a correlation between antipsychotic dosage and OCT results in schizophrenia does not undoubtedly prove the effect of these drugs on the retinal morphology. The antipsychotic dosage also implies the magnitude of psychopathological symptoms82. Undoubtedly, the problem of the potential influence of antipsychotic agents on the accuracy of schizophrenia classification based on OCT data requires further research and verification. A potential solution to this problem may be to include a third group in the study (e.g. schizophrenia, bipolar disorder, and control). A second clinical group such as patients with bipolar disorder with a history of psychotic features would allow for matching patient groups on variables such as current and lifetime antipsychotic medication use. This needs to be done because, as recent meta-analyses show, studies of automatic classification of schizophrenia based on various types of neuroimaging and neurophysiological data typically include only a group of schizophrenia patients and healthy controls83,84. Our study also addresses only the schizophrenia classification by distinguishing SSZ patients from controls. Therefore, future studies using OCT data are necessary to test whether retinal measures are sufficient to automatically differentiate schizophrenia and e.g. affective disorders.

As we have already noted, there is a fast-growing body of studies using classical statistical methods demonstrating structural and functional retinal abnormalities in schizophrenia58,59,60. However, so far there have been only a few published ML-based analyses testing whether retinal measures can be efficiently used in computational models to classify schizophrenia, and some of these attempts were not based on OCT results27,28. Our goal was therefore to test whether the use of deep networks and aggregation functions enables such classification. The results indicate that this is indeed possible, but it should be noted that the outcomes were derived from complex, multi-stage computational analyses, which may hinder direct translational use. Still, such translational difficulties regarding implementing ML-dependent outcomes into clinical practice in psychiatry are common regardless of input data type85. According to Ferrara et al’s86 review of the current ML-based biomarker candidates, individual studies of ML or artificial intelligence methods for diagnostic classification show promise, but applications to real-world practice are still in their infancy. Therefore, if further studies confirm the value of OCT data for detection of schizophrenia, it will be necessary to increase the user-friendliness of ML applications, including simplifying the interpretation of results, in order to translate obtained data into clinical practice.