Multi-stream deep learning framework integrating images and feature representations to predict mild cognitive impairment using the rey complex figure test

Park, Junyoung; Seo, Eun Hyun; Kim, Sunjun; Yi, SangHak; Lee, Kun Ho; Won, Sungho

doi:10.1038/s41598-025-34491-5

Download PDF

Article
Open access
Published: 01 March 2026

Multi-stream deep learning framework integrating images and feature representations to predict mild cognitive impairment using the rey complex figure test

Junyoung Park^1,2,3,
Eun Hyun Seo^4,5,
Sunjun Kim⁴,
SangHak Yi⁶,
Kun Ho Lee^4,7,8 &
…
Sungho Won^3,9,10,11

Scientific Reports volume 16, Article number: 9629 (2026) Cite this article

939 Accesses
10 Altmetric
Metrics details

Subjects

Abstract

Drawing tests like the Rey complex figure test (RCFT) are widely used to assess cognitive functions such as visuospatial skills and memory, making them valuable tools for detecting mild cognitive impairment (MCI). Despite their utility, existing predictive models based on these tests often suffer from limitations like small sample sizes and lack of external validation, which undermine their reliability. We developed a multi-stream deep learning framework that integrates two distinct processing streams: a multi-head self-attention based spatial stream using raw RCFT images and a scoring stream that employs RCFT scores generated by a previously developed AI-based scoring model together with demographic features. Our model was trained on data from 1740 subjects in the Korean cohort and validated on an external hospital dataset of 222 subjects from Korea. The proposed multi-stream model demonstrated superior performance over baseline models (AUC = 0.872, Accuracy = 0.781) in external validation. The integration of both spatial and scoring streams enables the model to capture intricate visual details from the raw images while also incorporating structured scoring data, which together enhance its ability to detect subtle cognitive impairments. This dual approach not only improves predictive accuracy but also increases the robustness of the model, making it more reliable in diverse clinical settings. Our model has practical implications for clinical settings, where it could serve as a cost-effective tool for early MCI screening.

Introduction

Drawing tests have been well-documented for their comprehensive assessment capabilities which include evaluating visuospatial skills, visual memory and executive function, and they are commonly used within the elderly population as a cognitive screening tool for dementia, both in clinical and research fields¹. Among the most prominent drawing tests are the Pentagon Drawing Test (PDT), the Clock Drawing Test (CDT), and the Rey Complex Figure Test (RCFT). The PDT, for example, requires participants to draw two intersecting pentagons with scoring typically binary (fail or success)². The CDT assesses executive function and visuospatial skills by having subjects draw a clock face set to a specific time, with scoring methods varying significantly from a binary system to detailed point assignments based on accuracy of contour, number sequence, and hand placement^3,4,5. The RCFT, designed by Rey⁶, challenges participants to copy and recall a complex figure, with a widely used 36-point scoring system developed by Osterrieth⁷.

Recent advancements have seen the application of machine learning approaches to enhance the predictive accuracy of cognitive status from these tests. This is particularly valuable because of the simplicity of administering drawing tests, which could be useful for screening early stages of dementia in clinical fields. For example, deep-learning approaches have been utilized for the digitized PDT⁸, CDT⁹ and RCFT¹⁰ to predict MCI and CN patients. Additionally, multi-dimensional kinematic parameters extracted from a digital pen and tablet during RCFT were analyzed using logistic regression¹¹.

However, there are some limitations in previous studies. Primarily, most of these studies had small samples sizes and lacked an external test set, which undermined the reliability of model performances. Even in cases where sample sizes were not small, the performance of models was not sufficiently robust for screening early stages of dementia. This could be attributed to the challenges inherent in utilizing image data in deep learning models. For instance, image data often contains a vast amount of information but can also be prone to noise due to its high dimensionality^12,13. Moreover, image data encompasses diverse patterns and features, making it challenging for models to learn effectively, especially when sample sizes are not significantly large¹⁴.

In this paper, we propose a novel multi-stream deep learning framework composed of a spatial stream that processes raw RCFT images and a scoring stream that integrates RCFT scores generated by a previously developed AI-based scoring model along with demographic features¹⁵. The proposed model was implemented by using a total 1,740 subjects (CN 947, MCI 793) to train a deep learning model for distinguishing MCI patients from CN subjects. Additional 222 subjects (CN 106, MCI 116) were utilized as an external dataset to improve the reliability of the model performance.

Materials and methods

Datasets

The study was approved by the Institutional Review Boards of Chonnam National University Hospital (CNUH‐2019‐279) and Wonkwang University Hospital (2022–01-024–004). All research was performed in accordance with relevant guidelines and regulations, including the Declaration of Helsinki. Informed consent was obtained from all participants and/or their legal guardians.

GARD cohort

We enrolled 1,740 subjects from the Gwangju Alzheimer’s and Related Dementia (GARD) cohort registry at Chosun University in Gwangju, Korea during 2015–2019. The diagnostic criteria for CN and MCI have been described in Seo et al.¹⁶. Briefly, CN subjects were included if they were aged 60 or older, had a Clinical Dementia Rating (CDR) score of 0, and exhibited normal cognitive function, with all neuropsychological test z-scores above − 1.5 × standard deviation (SD) based on age, education, and gender norms. MCI patients were aged 60 or older, had a CDR score of 0.5, and met the MCI criteria established by¹⁷.

WUH cohort

The Wonkwang University Hospital (WUH) cohort includes 106 CN subjects and 116 MCI patients enrolled between 2017 and 2022. In alignment with our training set criteria, subjects were classified based on their CDR scores: a CDR score of 0 indicated a CN diagnosis, while a score of 0.5 indicated MCI.

Deep learning architecture

Figure 1A provides an overview of the proposed method. Our model predicts the probability of an individual being classified as a MCI patients using three pre-processed RCFT images along with age, sex and years of education. The pre-processing method for the RCFT images follows the protocol outlined by Park et al.¹⁵. Our prediction model employs a dual-stream architecture: a spatial stream and a scoring stream. Both streams process data through softmax functions, and their outputs are merged using average fusion to yield the final classification probability. In the spatial stream, each 512 × 512 image is input into a CNN model that uses EfficientNet¹⁸ as its backbone. We selected EfficientNet-B2 for its efficiency and suitability in medical applications, given its lower parameter count and adequate performance with limited datasets. EfficinetNet-B2 incorporates a 3 × 3 convolution layer followed by multiple 3 × 3 and 5 × 5 mobile inverted bottleneck convolution (MBConv) blocks, a design borrowed from MobileNet¹⁹ (Fig. 1B). Post-CNN, the feature map are flattened, and a multi-head self-attention layer is applied, enhancing the model’s focus on significant spatial region. The multi-head self-attention mechanism, as defined by²⁰, combines multiple self-attention layers to capture diverse features, expressed as:

$$MultiHead\left( {Q,K,V} \right) = Concat\left( {head_{1} , \ldots , head_{h} } \right)W^{O} ,$$

$$head_{i} = Attention(QW_{i}^{Q} ,KW_{i}^{K} ,VW_{i}^{V} )$$

where $Q$, $K$, $V$ are the query, key and value matrix, respectively, and we use four attention heads ($h$=4). The outputs from multi-head self-attention layers are integrated and processed through two fully connected (FC) layers followed by a softmax function.

Conversely, the scoring stream incorporates RCFT scores generated by an AI-based scoring model¹⁵ and demographic variables (age, sex, and years of education). These features, RCFT scores from the three images together with the demographic variables, are concatenated and passed through a fully connected layer followed by a softmax function. Importantly, the weights of the AI-based scoring model are frozen, meaning that the scoring stream receives fixed RCFT score outputs during training and no parameter updates occur within this module.

Baseline models

The proposed model was evaluated against four baseline models: three logistic regression models and one deep learning model. The first baseline model utilized MMSE scores. The second baseline used three RCFT scores assessed by trained experts, while the third baseline used three RCFT scores generated by an AI-based scoring model. The final baseline was a deep learning model, which solely utilized the spatial stream network. All baseline models included age, sex and years of education as covariates.

Quality control of RCFT scoring using an AI-based scoring model

To mitigate potential errors arising from manual scoring, scanning, and digitization, we applied the AI-based scoring model for the external test set (n = 666) to enhance data quality. For images in which the discrepancy between the expert-assessed scores and the AI-generated scores exceeded ten points, trained human experts re-evaluated the drawings. The updated expert scores were then compared with the AI-generated scores to ensure scoring accuracy and reliability.

Experiments

We conducted prediction model building and performance evaluation using data from GARD and WUH cohort. GARD cohort was employed to construct the prediction model. Throughout the training process, we utilized the binary cross-entropy as the loss function and the Adam optimizer was adopted to minimize the loss function. To prevent overfitting, we reduced the initial learning rate to 10% every five epochs and implemented early stopping if there was no improvement in validation loss after 30 epochs, ensuring that the final model weights selected corresponded to the lowest validation loss.

To evaluate our model’s performance, GARD cohort was randomly divided into training, validation and test sets with 6:2:2 ratio. This division process was repeated fifty times. External validation was performed using WUH cohort. Model performance was assessed using the area under receiver operating characteristics (AUC), the accuracy (ACC), sensitivity (SEN) and specificity (SPE).

All experiments were conducted using the Pytorch library (v 2.0.0) in Python (v 3.8.8) with NVIDIA 1080ti GPUs with 48 GB of memory per GPU.

Results

Characteristics

Table 1 summarizes the clinical characteristics of subjects in the GARD and WUH cohort datasets. In the GARD dataset, the average ages were 71.8 ($\pm$ 6.1) years for CN subjects and 73.5 ($\pm$ 6.4) years for MCI patients (P < 0.01). Education levels and MMSE scores also significantly differed between CN subjects (education level: 10.4 $\pm$ 4.6; MMSE score: 27.5 $\pm$ 2.1) and MCI patients (9.8 $\pm$ 4.7; 25.5 $\pm$ 3.1) (P < 0.01). Similarly, sex ratios exhibited comparable trends in both groups. Conversely, the WUH dataset revealed no significant differences in the average ages between CN (69.9 $\pm$ 7.7) subjects and MCI (71.4 $\pm$ 8.3) patients (P > 0.05), nor were there differences in education levels between CN (8.7 $\pm$ 4.2) and MCI (9.2 $\pm$ 4.5) groups (P > 0.05). Comparing the two datasets, the external test set consistently showed lower age, education level, and RCFT scores across both groups, with the exception of the education level and RCFT copy score in CN group of the GARD dataset.

Table 1 Descriptive statistics. A dataset of 1,740 subjects from the Gwangju Alzheimer’s and Related Dementia (GARD) cohort was used for training, and an external test set of 222 subjects from Wonkwang University Hospital (WUH) was used for validation.

Full size table

Improved agreement after AI-assisted scoring quality control

The initial correlation (R²) between scores generated by an AI-based scoring model and expert-assessed scores was 0.81, with a mean absolute error (MAE) of 3.0 points (Fig. 2A). Among the 666 external test images, 30 cases showed discrepancies greater than 10 points between the expert-assessed scores and the AI-generated scores. After re-evaluation by trained experts, scores for 26 images were corrected. Following this correction, the correspondence improved substantially, yielding an $R^{2}$ of 0.95 with an MAE = 2.0 (Fig. 2 B).

Comparison of model performance via internal test using GARD cohort

We evaluated the classification performances of five models, including three that incorporated the proposed method. These models are: 1) logistic regression using MMSE scores; 2) logistic regression using RCFT scores assessed by experts; 3) logistic regression using RCFT scores generated by an AI-based scoring model; 4) deep learning model utilizing only spatial stream network; 5) deep learning model employing multi stream networks.(Fig. 3) The mean performances of those models are shown in (Table 2A).

Table 2 Results of model prediction performance. (A) Internal test using the GARD cohort dataset. The baseline models consisted of three logistic regression models using (1) MMSE scores, (2)expert-assessed RCFT scores, and (3) AI-generated RCFT scores produced by a previously developed AI-based scoring model, as well as (4) a deeplearning model using only the spatial stream. All baseline models included chronological age, sex, and education as covariates. The data was split into6:2:2 (training, validation, and testing sets), and this process was repeated 50 times. (B) External test using the WUH cohort dataset. ExpertsassessedRCFT scores refers to the models using the initial expert-assessed scores before QC, while expert-corrected scores indicates the models usingthe expert-corrected scores obtained after re-evaluating based on comparisons with the AI-generated RCFT scores.

Full size table

The logistic regression model with MMSE scores demonstrated the lowest performance, with an AUC of 0.714 [95% confidence interval: 0.706–0.712], an ACC of 0.660 [0.652–0.667], SEN of 0.625 [0.613–0.636] and SPE of 0.694 [0.685–0.704]. The logistic regression model using expert-assessed RCFT scores recorded an AUC of 0.776 [0.768–0.782], an ACC of 0.705 [0.699–0.712], an SEN of 0.700 [0.689–0.711] and an SPE of 0.71 [0.700–0.722]; the performance of the model using RCFT scores generated by an AI-based scoring model was similar, with an AUC of 0.777 [0.770–0.783], ACC of 0.710 [0.703–0.717], SEN of 0.699[0.689–0.709] and SPE of 0.721 [0.710–0.731].

Performance improvements were evident with the spatial stream network model, which achieved an AUC of 0.803 [0.768–0.837], ACC of 0.731 [0.702–0.761], SEN of 0.701 [0.661–0.741] and SPE of 0.762[0.720–0.804]. Finally, our proposed deep learning model using the two-stream network outperformed all baseline models across all metrics, with an AUC of 0.852 [0.837–0.869], ACC of 0.771 [0.755–0.787], SEN of 0.742 [0.718–0.767] and SPE of 0.800 [0.774–0.823].

External validation using WUH cohort

Performance metrics for the trained models on this set are detailed in Table 2 (B). The logistic regression model using expert-assessed RCFT scores from the initial dataset demonstrated an AUC of 0.750 [0.750–0.751], ACC of 0.709 [0.707–0.712], SEN of 0.832 [0.829–0.835] and SPE of 0.575 [0.571–0.579]. With the validated dataset based on the re-rated RCFT scores, the model’s performance improved to an AUC of 0.813 [0.812–0.814], ACC of 0.750 [0.748–0.753], SEN of 0.799 [0.718–0.767] and SPE of 0.800 [0.774–0.823]. The logistic model with RCFT scores generated by an AI-based scoring model displayed comparable performance to that of human experts (AUC = 0.804[0.803–0.805], ACC = 0.722[0.721–0.725], SEN = 0.799[0.797–0.802] and SPE = 0.639[0.634–0.722]). The deep learning model employing the spatial stream network achieved a higher AUC (0.837[0.814–0.860]), ACC (0.744[0.719–0.768]) and SPE (0.745[0.697–0.792]) but had a lower SEN (0.743[0.690–0.800]). Our proposed deep learning method using the two-stream network outperformed all baseline models, showing superior performance across all metrics: AUC = 0.872[0.862–0.882], ACC = 0.781[0.768–0.795], SEN = 0.836[0.807–0.864] and SPE = 0.722[0.687–0.757].

Discussion

In this article, we developed a multi-stream deep learning network to differentiate between MCI patients and CN subjects. Our approach surpasses previous methods utilizing drawing test (PDT, CDT and RCFT) by leveraging a larger sample size and an external test set, thereby enhancing the robustness and performance of the model. Notably, our model outperformed existing studies, achieving the highest recorded performance metrics.

Our multi-stream network combines both the scoring stream and spatial stream. The scoring stream incorporates RCFT scores generated by an AI-based RCFT scoring model, which reduces scoring time, minimizes human resource demands, and proactively prevents human scoring errors, thus improving accuracy. This advantage was demonstrated by our results. When AI-generated RCFT scores were used during the QC process, the overall model performance improved substantially compared with the performance based on the initial expert-assessed scores without QC. Furthermore, while expert scoring requires approximately 5 min per subject, the AI-based scoring model produces scores in about 10 s, highlighting its efficiency and scalability in clinical settings. The spatial stream of our model utilizes raw RCFT images as input, and captures subtle details within the images, such as pen thickness and stroke shape, that are not reflected in the standard human scoring system (0–36 points). This complementary information leads to substantial performance gains compared to models that rely solely on scoring. However, although raw image data are rich in information, they also contain considerable noise. Accordingly, the integration of multi-head self-attention layers enables the model to prioritize crucial spatial regions within the feature map, improving performance. Nonetheless, models that depend exclusively on raw images have shown higher variability in performance compared to logistic models based on RCFT scores, and the performance of the spatial stream network may be compromised due to resolution differences between existing training images and newly acquired test images. By combining the advantages of both scoring stream, which leverages RCFT scores generated by an AI-based model trained on human scoring system, and the spatial stream network, which processes images, our proposed method achieves high and robust performance.

The proposed method provides clinically practical and scalable approach for screening individuals at risk of early stage of cognitive impairment at medical check-up centers. Currently, the MMSE is the most commonly utilized screening tool because of its simplicity and quick administration time of approximately 5–10 min ². However, our results indicate that MMSE is less informative for predicting MCI and showed limited accuracy in distinguishing between CN subjects and MCI patients (AUC = 0.714), consistent with previous finding (AUC = 0.733, N = 2,577)⁸. In contrast, comprehensive cognitive function tests such as the Neuropsychological Test Battery require substantial time, often up to 2 h, as well as additional effort for scoring and interpretation, making them impractical for large-scale screening²¹. Although the RCFT requires more administration time than the MMSE, approximately 30 min including a 20-min delay interval²², our RCFT-based model significantly outperformed that of the MMSE (AUC > 0.85). Furthermore, since the model requires only RCFT drawings and basic demographic information that are already collected routinely at medical check-up centers, no additional procedures or data collection steps are needed, making its integration into existing workflows straightforward. The model also derives AI-generated RCFT scores and predicted risk of cognitive impairment within a few seconds, eliminating the 5–10 min of clinician time typically needed for manual expert scoring. This reduction in time and personnel burden substantially enhances efficiency while maintaining high performance. Together, these advantages highlight the strong potential of the proposed method for real-world clinical adoption, offering a practical, accurate, and workflow-friendly alternative to traditional cognitive assessments.

In addition to these advantages, it is important to note that our RCFT-based approach offers differentiated clinical value compared with existing digital cognitive assessment tools (DCATs). Many widely used DCATs (e.g., ANAM, CogniCA) primarily assess reaction time, processing speed, and attentional control through brief computerized tasks. Although comprehensive platforms such as CANTAB evaluate a broader set of cognitive domains, they do not capture high-level visuospatial constructional abilities or non-verbal visual memory through complex figure copying and recall. The copy task captures spatial planning and structural integration, while the recall task provides a language-independent measure of visual memory that is particularly useful in low-education elderly populations. Furthermore, the RCFT allows qualitative evaluation of drawing strategies that can reflect executive dysfunction, information that is not available from single-score outputs of typical DCATs. By integrating these rich cognitive signals through a multi-stream deep learning framework, our model leverages a type of information fundamentally different from what existing digital tools can provide.

Despite the strengths of the proposed method, our study had some limitations and areas for future development. First, our model was developed and validated using only Korean cohorts. Although we included an external validation dataset from an independent institution, it was also collected within the same country and therefore does not fully address potential ethnic or cultural biases. While the RCFT is a nonverbal, visuospatial test with minimal linguistic influence, validation in larger and more diverse international cohorts is needed to confirm broader generalizability. Second, our model relied solely on static RCFT drawings, as both cohorts used a traditional paper-and-pencil administration. Consequently, kinematic information such as drawing speed, pressure, temporal patterns, and the sequence of strokes could not be incorporated, despite evidence that these features provide meaningful biomarkers of cognitive decline^11,23. We have recently developed a tablet-based RCFT platform that records real-time drawing trajectories and extracts kinematic parameters. This will allow for development of future models that integrate these signals and potentially achieve further performance improvements. Another limitation concerns the interpretability of the proposed model. Despite using a hybrid framework that integrates image-derived features with conventional RCFT scores generated by an AI-based model, the final prediction remains a black-box output without explicit explanations for its decision. Since clinical adoption requires transparency, future work should incorporate explainable AI tools such as Grad-CAM, attention-based visualizations, and feature-attribution methods to improve interpretability and clinician trust.

In conclusion, our multi-stream deep learning network outperformed previous studies in distinguishing MCI patients from CN subjects. By integrating AI-generated RCFT scores with image-based information, our model demonstrated robust performance across internal and external datasets. Our findings suggest potential clinical utility as a time-efficient screening tool for cognitive impairment.

Data availability

The dataset for the current study is not publicly available but is available from the corresponding author upon reasonable request.

References

AGRELL, B. and O. DEHLIN,. The clock-drawing test. Age Ageing 27(3), 399–403 (1998).
Article Google Scholar
Folstein, M. F., Folstein, S. E. & McHugh, P. R. “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 12(3), 189–198 (1975).
Article PubMed CAS Google Scholar
Nasreddine, Z. S. et al. The montreal cognitive assessment, MoCA: a brief screening tool for mild cognitive impairment. J Am Geriatr Soc 53(4), 695–699 (2005).
Article PubMed Google Scholar
Darvesh, S. et al. The behavioural neurology assessment. Can J Neurol Sci 32(2), 167–177 (2005).
Article PubMed CAS Google Scholar
Mendez, M. F., Ala, T. & Underwood, K. L. Development of scoring criteria for the clock drawing task in Alzheimer’s disease. J Am Geriatr Soc 40(11), 1095–1099 (1992).
Article PubMed CAS Google Scholar
Rey, A. L’examen psychologique dans les cas d’encephalopathie traumatique. Archives de psychologie 28, 286–340 (1941).
Google Scholar
Osterrieth, P.A., Le test de copie d’une figure complexe; contribution a l’etude de la perception et de la memoire. Archives de psychologie, 1944.
Tasaki, S. et al. Explainable deep learning approach for extracting cognitive features from hand-drawn images of intersecting pentagons. NPJ Digital Medicine 6(1), 157 (2023).
Article PubMed PubMed Central Google Scholar
Ruengchaijatuporn, N. et al. An explainable self-attention deep neural network for detecting mild cognitive impairment using multi-input digital drawing tasks. Alzheimers Res Ther 14(1), 111 (2022).
Article PubMed PubMed Central Google Scholar
Cheah, W.-T., et al. 2019 A screening system for mild cognitive impairment based on neuropsychological drawing test and neural network. in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE.
Zhang, X. et al. A tablet-based multi-dimensional drawing system can effectively distinguish patients with amnestic MCI from healthy individuals. Sci. Rep. 14(1), 982 (2024).
Article PubMed PubMed Central ADS Google Scholar
Pintelas, E., Livieris, I. E. & Pintelas, P. E. A convolutional autoencoder topology for classification in high-dimensional noisy image datasets. Sensors 21(22), 7731 (2021).
Article PubMed PubMed Central ADS Google Scholar
Jia, W. et al. Feature dimensionality reduction: a review. Complex Intell Syst 8(3), 2663–2693 (2022).
Article Google Scholar
Balki, I. et al. Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can. Assoc. Radiol. J. 70(4), 344–353 (2019).
Article PubMed Google Scholar
Park, J. Y. et al. Automating rey complex figure test scoring using a deep learning-based approach: a potential large-scale screening tool for cognitive decline. Alzheimer’s Research & Therapy 15(1), 145 (2023).
Article Google Scholar
Seo, E. H. et al. Visuospatial memory impairment as a potential neurocognitive marker to predict tau pathology in Alzheimer’s continuum. Alzheimer’s Research & Therapy 13, 1–14 (2021).
Google Scholar
Winblad, B. et al. Mild cognitive impairment–beyond controversies, towards a consensus: report of the international working group on mild cognitive impairment. J. Intern. Med. 256(3), 240–246 (2004).
Article PubMed CAS Google Scholar
Tan, M. and Q. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. in International conference on machine learning. 2019. PMLR.
Sandler, M., et al. Mobilenetv2: Inverted residuals and linear bottlenecks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Vaswani, A. et al. Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1 (2017).
Google Scholar
Ryu, H. J. & Yang, D. W. The Seoul Neuropsychological Screening Battery (SNSB) for Comprehensive Neuropsychological Assessment. Dement. Neurocogn Disord. 22 (1), 1–15 (2023).
Shin, M. S. et al. Clinical and empirical applications of the Rey-Osterrieth Complex Figure Test. Nat. Protoc. 1 (2), 892–899 (2006).
Kim, K. W. et al. A comprehensive evaluation of the process of copying a complex figure in early-and late-onset Alzheimer disease: a quantitative analysis of digital pen data. J. Med. Internet. Res. 22 (8), e18136 (2020).

Download references

Funding

This work was supported by the Technology Innovation Program (20022810, Development and Demonstration of a Digital System for the evaluation of geriatric Cognitive impairment) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea), and by the “Korea National Institute of Health” (KNIH) research project No.#2024ER210800.

Author information

Authors and Affiliations

Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, 94305, USA
Junyoung Park
Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, USA
Junyoung Park
RexSoft Inc, Seoul, 08826, Korea
Junyoung Park & Sungho Won
Gwangju Alzheimer’s & Related Dementia Cohort Research Center, Chosun University, Gwangju, 61452, Korea
Eun Hyun Seo, Sunjun Kim & Kun Ho Lee
Premedical Science, College of Medicine, Chosun University, Gwangju, 61452, Korea
Eun Hyun Seo
Department of Neurology, Wonkwang University Hospital, Wonkwang University School of Medicine, Iksan, 15865, Korea
SangHak Yi
Department of Biomedical Science, Chosun University, Gwangju, 61452, Korea
Kun Ho Lee
Korea Brain Research Institute, Daegu, 41068, Korea
Kun Ho Lee
Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, 08826, Korea
Sungho Won
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea
Sungho Won
Institute of Health and Environment, Seoul National University, Seoul, 08826, Korea
Sungho Won

Authors

Junyoung Park
View author publications
Search author on:PubMed Google Scholar
Eun Hyun Seo
View author publications
Search author on:PubMed Google Scholar
Sunjun Kim
View author publications
Search author on:PubMed Google Scholar
SangHak Yi
View author publications
Search author on:PubMed Google Scholar
Kun Ho Lee
View author publications
Search author on:PubMed Google Scholar
Sungho Won
View author publications
Search author on:PubMed Google Scholar

Contributions

J.P. performed all data preprocessing, experiments, and manuscript writing as the first author. E.H.S., S.Y., and K.H.L. provided the data. S.K. conducted model evaluation. S.W. organized and supervised the study. All authors reviewed and approved the submitted version of the manuscript.

Corresponding authors

Correspondence to Kun Ho Lee or Sungho Won.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

The study was approved by the Institutional Review Boards of Chonnam National University Hospital (CNUH‐2019‐279) and Wonkwang University Hospital (2022–01-024–004). Written informed consent was obtained from each participant or their legal guardian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Park, J., Seo, E.H., Kim, S. et al. Multi-stream deep learning framework integrating images and feature representations to predict mild cognitive impairment using the rey complex figure test. Sci Rep 16, 9629 (2026). https://doi.org/10.1038/s41598-025-34491-5

Download citation

Received: 14 June 2025
Accepted: 29 December 2025
Published: 01 March 2026
Version of record: 23 March 2026
DOI: https://doi.org/10.1038/s41598-025-34491-5