Introduction

Psychiatric disorders commonly manifest from underlying brain dysfunctions, often resulting in cognitive deficits across various neuropsychological domains1. The Rey‒Osterrieth Complex Figure Test (RCFT) has been used as a neuropsychological measure in clinical and research settings to evaluate visuospatial memory and executive function, such as organizational strategy and planning2. Notably, impairment of these functions is commonly reported in both patients with psychotic disorders and those with obsessive‒compulsive disorder (OCD)3,4,5,6,7,8. Both patient groups often exhibit significantly lower RCFT immediate recall scores and organization scores than healthy controls (HCs), stemming from executive function deficits during visuospatial memory encoding3,5,7, suggesting that these deficits could serve as transdiagnostic markers across these disorders. These cognitive deficits are also often observed in patients with various psychiatric and neurological disorders, such as bipolar disorder, Alzheimer’s disease, Parkinson’s disease, closed head injury, autism, epilepsy, encephalitis, multiple sclerosis, and ischemic stroke, who exhibit brain dysfunctions9,10,11,12,13,14,15,16.

Although the well-established conventional RCFT is a valuable tool for assessing cognitive deficits across various psychiatric and neurological disorders, it has several limitations. The RCFT indirectly assesses executive function through a drawing task, which makes it challenging to link observable behavioral-level phenotypes, such as poor drawing, to underlying brain dysfunction. The RCFT can also be affected by a patient’s visuomotor function17; in addition, the administration and scoring systems of the RCFT are time-consuming, labor-intensive, complex, and subject to scoring variability among clinicians owing to human biases18,19.

Considerable efforts have been made to overcome the limitations of the RCFT scoring system, including the development of an automated scoring system using photos of RCFT drawings and a deep learning algorithm20,21,22,23, the implementation of a tablet-based digital drawing assessment24, and the adoption of a simpler method for scoring organizational strategies (0 or 1 points)25. Although these approaches have made substantial advancements in addressing the complexity, labor intensity, and scoring variability of the scoring system, there are still limitations in its administration given that it is a time-consuming, visuomotor function-affected, indirect drawing test.

To address the remaining limitations, a previous study from our laboratory successfully identified an eye movement biomarker that can be used to rapidly and directly assess impaired organizational strategy during the RCFT in patients with OCD26. However, the application of the results of that study to other psychiatric and neurological disorders is limited because the biomarker relies solely on calculations of eye gaze distribution and is based on the assumption that patients with OCD exhibit weak central coherence, focusing on narrow details of the RCFT figure. Thus, there is a need to develop a data-driven deep learning-based assessment model that is not constrained by a single disease characteristic and exhibits increased speed, simplicity, and directness and to extend its applicability to a broader range of psychiatric and neurological disorders. The development of this model would also be consistent with recent interest in assessing cognitive functions such as visual memory and attention in patients with mental illnesses using eye tracking, as eye movements provide real-time insights into the cognitive activities involved in how gaze interacts with visuospatial stimuli during the visual encoding process27,28,29,30,31,32.

Therefore, this study aimed to establish an assessment model using eye tracking and deep learning in patients with early psychosis, including those with first-episode psychosis (FEP), patients at clinical high risk (CHR) for psychosis, patients with OCD and HCs. The primary aim was to explore the effectiveness of the model as a rapid, simple, and direct assessment of impaired executive function in these patients. FEP and OCD patients were selected because of their notable impairments in executive function and visuospatial memory3,5, whereas CHR patients were selected because they included both patients with early psychosis and those with nonspecific psychiatric symptoms, such as depression and anxiety33. We hypothesized that the RCFT assessment model developed in this study would be able to distinguish normal and impaired executive function as well as identify visuospatial memory impairment on the basis of eye movements during the memorization of the RCFT figure, regardless of the specific psychiatric diagnosis, with increased speed, simplicity, and directness.

Results

Participant characteristics

The demographic and clinical characteristics of the participants in each diagnostic group are summarized in Table 1. The demographic characteristics of the participants, stratified by normal or impaired executive function and normal or impaired visuospatial memory, are summarized in Table 2. The participants with normal executive function, as measured by the organization T score, and normal visuospatial memory, as measured by the immediate recall T score, had a greater intelligence quotient (IQ) than did the subjects with impaired executive function (t = 2.801, p = 0.013) and impaired visuospatial memory (t = 6.832, p =  < 0.001), respectively. There were more females than males with impaired executive function than with normal executive function (χ2 = 4.620, p = 0.032). Age, years of education, handedness and the proportion of participants wearing glasses were not different between the groups with normal and impaired executive function or visuospatial memory.

Table 1 Demographic and clinical characteristics of the participants grouped by disease diagnosis and their Rey–Osterrieth complex figure test (RCFT) results.
Table 2 Demographic characteristics and Rey–Osterrieth complex figure test (RCFT) results compared between participants with normal and impaired executive function, as measured by the organization T score, and between participants with normal and impaired visuospatial memory, as measured by the immediate recall T score.

RCFT performance and long short-term memory (LSTM) + Attention model results

Analysis of covariance (ANCOVA) with IQ as a covariate revealed that participants with normal executive function had higher organization T scores (F = 249.031, p < 0.001) than did participants with impaired executive function. ANCOVA using sex and IQ as covariates revealed that the participants with normal executive function had higher total organization scores (F = 50.382, p < 0.001), fragmentation scores (F = 4.301, p = 0.039), and planning scores (F = 102.373, p < 0.001) than did the participants with impaired executive function. Similarly, ANCOVA with IQ as a covariate revealed that participants with normal visuospatial memory had higher immediate recall T scores (F = 87.960, p < 0.001), total immediate recall scores (F = 200.415, p < 0.001), configural presence scores (F = 242.611, p < 0.001), configural accuracy scores (F = 70.500, p < 0.001), cluster presence scores (F = 89.203, p < 0.001), cluster accuracy scores (F = 52.323, p < 0.001), and detail presence scores (F = 39.270, p < 0.001) than did participants with impaired visuospatial memory. There was no difference in immediate recall time (F = 2.073, p = 0.151) between participants with normal and impaired visuospatial memory. The ANCOVA results for RCFT performance between HCs and patients with normal and impaired executive function and visuospatial memory are described in Table S1 in the Supplementary material.

Table 3 shows the deep learning performance results of the LSTM + Attention model34,35 for the two classifications. The model achieved F1 scores of 83.5 and 80.7% and area under the receiver operating characteristic curve (AUROC) values of 60.7% (Fig. 1a) and 69.9% (Fig. 1b) for distinguishing between normal and impaired executive function and between normal and impaired visuospatial memory, respectively.

Table 3 LongShort Term Memory (LSTM) + Attention model classification results of patients with normal and impaired executive function (organization T score) and patients with normal and impaired visuospatial memory (immediate recall T score).
Figure 1
figure 1

Gaze fixation sequence map showing the order of eye movements during the 3-min memorization of the Rey‒Osterrieth Complex Figure Test (RCFT) figure. (a) Receiver operating characteristic (ROC) curve of the Long Short-Term Memory (LSTM) + Attention model classification for normal and impaired executive function. (b) ROC curve of the LSTM + Attention model classification for normal and impaired visuospatial memory. (c) The order of gaze fixation in the patient with the lowest organization T score (i.e., < 20). (d) The order of gaze fixation in the patient with the highest organization T score (i.e., 70). The numbers within the circles indicate the order of gaze fixation, starting from 0. Abbreviation: AUROC, area under the ROC curve.

Model interpretation results

Eye gaze sequence maps are displayed in Fig. 1c and d. The order of gaze fixation was concentrated in a narrow area and horizontally distributed, with a low number of fixations in the patient with the lowest organization T score (i.e., < 20; Fig. 1c). Conversely, the order of gaze fixation was widely and evenly distributed across the figure in the patient with the highest organization T score (i.e., 70; Fig. 1d). The eye movement comparison results between the normal and impaired groups are provided in Tables 4 and 5.

Table 4 Eye movement characteristics of participants with normal and impaired executive function measured by the organization T score.
Table 5 Eye movement characteristics of the participants with normal and impaired visuospatial memory measured by immediate-recall T scores.

Discussion

This study aimed to develop an eye-tracking and deep learning-based RCFT assessment model for evaluating impaired executive function during visuospatial memory encoding in the RCFT that is faster, simpler, and more direct. The model achieved high performance in assessing impairment in early psychosis and OCD patients on the basis of their sequential eye movements while they were memorizing the RCFT figure. This assessment is performed regardless of specific psychiatric diagnoses, as this impairment is shared across these disorders. These results indicate that eye movements during the encoding of highly complex figures reflect executive function during visuospatial memory encoding, serving as a transdiagnostic biomarker of impairment in early psychosis and OCD. Additionally, since the assessment model in this study utilized a data-driven deep learning technique that does not require handcrafted feature selection according to specific disease hypotheses, there is the potential to extend the use of this model beyond early psychosis and OCD to other psychiatric and neurological disorders that also exhibit impaired executive function, poor performance in the RCFT in the form of disorganized and fragmented drawings, and difficulties in visuospatial integration9,10,11,12,13,14,15,16.

The model also offered a rapid and simple measure of impaired executive function, taking only 3 min from administration to assessment through computerized eye movement recording and deep learning analysis. This process is considerably faster and simpler than the traditional RCFT scoring system and the automated scoring systems reported in previous studies, which still require a prior drawing process20,21,22,23. This improved model is beneficial and easy to apply in real-world clinical and research settings, saving a significant amount of labor and time and reducing human scoring variability. Moreover, the model enabled a more direct assessment, as sequential eye movements reflected real-time visuospatial information processing36 and indicated how subjects strategically encoded, planned, and organized the figure. Thus, eye-tracking measurements can bridge the gap between the behavioral-level phenotype and brain dysfunction by capturing the inner workings of executive function during visuospatial memory encoding.

In the interpretation of the results of the deep learning model, the gaze fixation sequence maps in Fig. 1c and d show distinct eye movement patterns between patients with the lowest and highest organization scores. In the patient with the lowest organization score, exploration of the figure was limited, and the patient lacked a strategy and focused simply on horizontal movements without attending to important areas. In contrast, in the patient with the highest organization score, exploration was more structured and extensive, capturing a larger picture and focusing sequentially on various critical areas. In addition, quantitative differences in eye movements revealed that participants with impaired executive function and visuospatial memory spent less time looking at the figure and exhibited fewer fixations and saccades than the participants in the normal group did, indicating less effective and comprehensive encoding of the visuospatial information within the figure (Tables 4 and 5). These ineffective, disorganized, and limited eye movement patterns in the impaired group seem to be consistent with previous research findings that patients with executive function deficits have difficulties processing the overall RCFT figure and utilize a fragmented and piecemeal approach3,5,7. Overall, these differences in eye movement patterns between the impaired and normal groups may have contributed to the ability of the deep learning model to distinguish between them.

There are several limitations in this study. First, this assessment model was initially developed using data from individuals with early psychosis and OCD and classified their functions into only normal and impaired, limiting its applicability to individuals with other psychiatric and neurological disorders. Additionally, the impairment criteria in this study were stringent, making it challenging to detect patients with mild impairment. Nonetheless, this model has the potential to expand beyond binary classification and include a wider range of psychiatric and neurological disorders. Second, the small number of participants in the impaired group resulted in highly imbalanced datasets, which limits the validity and reliability of the assessment model in this study. To address this, the split dataset was stratified to maintain class label proportions consistent with those of the original dataset, and synthetic minority oversampling technique (SMOTE)37 data augmentation was implemented, as in previous studies with similarly imbalanced datasets38,39,40. However, our results should be interpreted with caution because of the relatively small and imbalanced sample size of the original dataset. Third, most patients were taking medication at the time of the eye-tracking RCFT. Therefore, it is necessary to consider the medication effect when interpreting the study results, as this study did not investigate the impact of medication on patients' RCFT performance or eye movement markers. However, given that the assessment model aims to encompass various psychiatric and neurological disorders in future research, these findings remain promising, as they indicate effectiveness of the model even in the presence of potential influences from medication. Fourth, there was a significant difference in IQ between the normal and impaired groups. Although the statistical group comparison was conducted with IQ as a covariate, the deep learning model, LSTM + Attention, does not account for or exclude the potential impact of cognitive function on eye movement markers in its classification.

Although the RCFT is a well-established tool for evaluating executive function during visuospatial memory encoding, its administration and scoring pose difficulties because of its time-consuming nature, indirect measurement, and scoring variability. While a previous study from our laboratory identified an eye movement biomarker to detect impaired executive function with enhanced speed and directness, its utility was limited to OCD because it was based on OCD-specific characteristics. Therefore, we developed an RCFT assessment model that integrated eye tracking and deep learning, which not only offered a more direct, rapid, and simplified evaluation of impaired executive function but also demonstrated the potential for wider applicability to other disorders, as it was data driven and did not rely on singular disease hypotheses. Future studies could benefit from including various psychiatric and neurological disorders and utilizing explainable artificial intelligence to identify key features distinguishing between individuals with normal and impaired executive function during visuospatial memory encoding.

Methods

Participants

We analyzed data from 408 participants, including 96 patients with FEP, 49 patients at CHR for psychosis, 104 patients with OCD, and 159 HCs. FEP patients and CHR individuals were recruited from both the inpatient and outpatient clinics of the Department of Neuropsychiatry and the Seoul Youth Clinic (www.youthclinic.org) at Seoul National University Hospital (SNUH). In this study, the FEP patients included individuals who were diagnosed with schizophrenia, schizoaffective disorder, or schizophreniform disorder according to the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Axis I Disorders (SCID-I) with an illness duration of less than 2 years. Psychotic symptoms were assessed using the Positive and Negative Syndrome Scale (PANSS). To confirm the CHR status of the participants, the Structured Interview for Prodromal Symptoms (SIPS)41 was used. Prodromal symptoms were assessed using the validated Korean version of the Scale of Prodromal Symptoms (SOPS)41,42. Patients with OCD were recruited from the OCD clinic (www.ocd.snu.ac.kr) at SNUH and fulfilled the Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV) criteria for OCD. The Yale‒Brown Obsessive Compulsive Scale (Y-BOCS)43 was used to evaluate the severity of OCD. The Hamilton Rating Scale for Depression (HAM-D)44 and the Hamilton Rating Scale for Anxiety (HAM-A)45 were used to evaluate the severity of depressive and anxious symptoms, respectively. The HCs were recruited using internet advertisements. To screen for the presence of psychiatric disorders or symptoms, the HCs underwent an assessment employing the Structured Clinical Interview for DSM-IV Non-Patient Version (SCID-NP). HCs with a past or current axis I diagnosis or first- to third-degree biological relatives with a history of psychotic disorders were not eligible to participate in this study. We assessed participants' IQ using the Korean version of the Wechsler Adult Intelligence Scale (K-WAIS)46.

All participants were assessed according to the exclusion criteria, which included the presence of neurological conditions, significant head injuries, substance abuse or dependency (except for nicotine), and intellectual disability (IQ < 70). We provided thorough explanations of the research procedures and obtained written informed consent from all participants (IRB no. H-1110-009-380, H-1201-008-392). For participants younger than 18, consent was also obtained from their parents. This study adhered to the principles outlined in the Declaration of Helsinki (2013) and received approval from the Institutional Review Board of SNUH (IRB No. H-2306-210-1445).

Eye movement data acquisition and the RCFT

Eye movement data were obtained during an eye-tracking experiment in which participants were instructed to view and memorize the RCFT figure for a duration of 3 min (Fig. 2), as detailed by Kim et al.26. Briefly, the RCFT figure was presented on a 19-inch monitor with a screen resolution of 1280 × 1024 pixels using Experiment Builder v.2.1.45 software (SR Research, Ottawa, Ontario, Canada). During the experiment, the participant's head was positioned on a chin rest in a room with low lighting. The distance between the chin rest and the monitor was 70 cm, and the participant had a horizontal viewing angle of 22° and a vertical viewing angle of 17°. Before eye movements were measured, a nine-point calibration and verification process was conducted. The data were collected at a 1,000-Hz sampling rate and exported through the EyeLink 1000 (SR Research) eye tracking device.

Figure 2
figure 2

Eye tracking-based Rey‒Osterrieth Complex Figure Test (RCFT) procedure used in this study. Memorization of an RCFT figure for 3 min was followed by immediate recall of the figure.

The key parameters collected were gaze fixation point coordinates, indicating where the eyes briefly paused to focus and acquire new information47, and time in milliseconds. Upon the completion of the eye-tracking session, the participants were instructed to reproduce the RCFT figure from memory, akin to the immediate recall condition of the RCFT. During this drawing task, response times were recorded, and an experimenter systematically tracked the participant’s reproduction of the figure. This meticulous monitoring aimed to assess organizational strategies as a substitution for the RCFT copy condition. A skilled evaluator manually assessed the participants’ drawings using the Boston Qualitative Scoring System (BQSS)48. This assessment aimed to evaluate the participants' organizational and immediate recall performance in the RCFT. The organization and immediate recall scores were subsequently categorized according to the BQSS clinical interpretation criteria as normal (score: 40–70) or impaired (score: < 39). The participants were grouped as follows: a normal executive function group with normal organization scores, a normal visuospatial memory group with normal immediate recall scores, an impaired executive function group with impaired organization scores and an impaired visuospatial memory group with impaired immediate recall scores.

LSTM + attention model

In this study, we utilized an LSTM model using Python and PyTorch49 to analyze time series eye movement fixations during the memorization of the RCFT figure. The effectiveness of the LSTM model in handling sequential data was a key factor in its selection, especially since sequential temporal relationships might play a significant role in participants' effective memorization and organization of the RCFT figure. Additionally, the LSTM model was combined with an attention mechanism. Time series fixation sequences were input recursively into the model, facilitating the learning of patterns and relationships within sequential eye movement fixations. The acquired representations at each timestamp were summed by attention coefficients to obtain the final sequence representation. The resulting representations were fed into a single-layer classifier to determine the probabilities of sequences belonging to a specific class (normal or impaired). The model was trained with a sequence size of 32 fixation points, and the learning rate was set to α = 0.005 over the training course. The fixation dataset was split 70/30 into training and testing sets, and the split was stratified to preserve class label proportions similar to those of the original dataset. The evaluation metrics used in the LSTM + Attention model included recall (sensitivity), precision, AUROC, and F1 score. The AUROC and F1 score were utilized to determine the accuracy of the model in highly imbalanced datasets, as in this study. The overall modeling workflow is described in Fig. 3.

Figure 3
figure 3

Overall workflow of modeling in this study. Abbreviations: Org, organization; IR, immediate recall; SMOTE, synthetic minority oversampling technique.

Data augmentation

In this study, the dataset was highly imbalanced, with a significant disparity between the majority class (e.g., 385 participants in the normal group) and the minority class (e.g., 23 participants in the impaired group). Imbalanced datasets cause problems for learning algorithms that expect an even distribution across classes, leading to bias favoring the majority class50. To address this, data augmentation is commonly employed to achieve an ideal balance, e.g., a 50:50 ratio, by artificially expanding the training dataset for enhanced reliability. Thus, the minority class (impaired group) in our training datasets was oversampled using SMOTE. The imbalance was maintained in the test dataset to represent the real-world distribution.

Statistical analysis

SPSS v.26.0 (IBM, Armonk, NY, USA) was used for the statistical analyses, and the significance level was set at p < 0.05. Comparisons of demographic and clinical characteristics across groups were performed using independent t tests or Welch's t tests if the variances were not equal for continuous variables and chi-square tests for categorical variables. Group comparisons of RCFT scores were performed using ANCOVA with IQ or IQ and sex as covariates.

Model interpretation

Owing to the limited interpretability of the decision-making process of the LSTM + Attention model, additional visual interpretations and statistical analyses were conducted to interpret the results of the model in this study. First, a gaze fixation sequence map was created to explore participants' visuospatial information processing and organization of the RCFT figure in sequence. Second, eye movement measures, including the number of fixations, average duration of fixation (ms), average saccade amplitude and duration (ms), and number of blinks and saccades, were compared between the normal and impaired groups to identify quantitative differences in eye movements. Saccades refer to rapid eye movements between fixations.